博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
R语言文摘:Subsetting Data
阅读量:6502 次
发布时间:2019-06-24

本文共 2151 字,大约阅读时间需要 7 分钟。

原文地址:https://www.statmethods.net/management/subset.html

 

R has powerful indexing features for accessing object elements. These features can be used to select and exclude variables and observations. The following code snippets demonstrate ways to keep or delete variables and observations and to take random samples from a dataset.

Selecting (Keeping) Variables

# select variables v1, v2, v3

myvars <- c("v1", "v2", "v3")
newdata <- mydata[myvars]
# another method
myvars <- paste("v", 1:3, sep="")
newdata <- mydata[myvars]
# select 1st and 5th thru 10th variables
newdata <- mydata[c(1,5:10)]

To practice this interactively, try  in the Data frames chapter of this 

 

Excluding (DROPPING) Variables

# exclude variables v1, v2, v3

myvars <- names(mydata) %in% c("v1", "v2", "v3") 
newdata <- mydata[!myvars]
# exclude 3rd and 5th variable 
newdata <- mydata[c(-3,-5)]
# delete variables v3 and v5
mydata$v3 <- mydata$v5 <- NULL

Selecting Observations

# first 5 observations

newdata <- mydata[1:5,]
# based on variable values
newdata <- mydata[ which(mydata$gender=='F' 
& mydata$age > 65), ]
# or
attach(mydata)
newdata <- mydata[ which(gender=='F' & age > 65),]
detach(mydata)

Selection using the Subset Function

The subset( ) function is the easiest way to select variables and observations. In the following example, we select all rows that have a value of age greater than or equal to 20 or age less then 10. We keep the ID and Weight columns.

# using subset function 

newdata <- subset(mydata, age >= 20 | age < 10, 
select=c(ID, Weight))

In the next example, we select all men over the age of 25 and we keep variables weight through income (weight, income and all columns between them).

# using subset function (part 2)

newdata <- subset(mydata, sex=="m" & age > 25,
select=weight:income)

To practice the subset() function, try this  on subsetting data.tables.

Random Samples

Use the sample( ) function to take a random sample of size n from a dataset.

# take a random sample of size 50 from a dataset mydata 

# sample without replacement
mysample <- mydata[sample(1:nrow(mydata), 50,
   replace=FALSE),]

转载于:https://www.cnblogs.com/chickenwrap/p/10166562.html

你可能感兴趣的文章
dtoj#4299. 图(graph)
查看>>
关于网站的一些js和css常见问题的记录
查看>>
zabbix-3.4 触发器
查看>>
换用代理IP的Webbrowser方法
查看>>
【视频编解码·学习笔记】7. 熵编码算法:基础知识 & 哈夫曼编码
查看>>
spark集群安装部署
查看>>
MySql 查询表字段数
查看>>
mariadb 内存占用优化
查看>>
Centos7安装编译安装zabbix2.219及mariadb-5.5.46
查看>>
Visual Studio Remote Debugger(for 2005/2008) .net远程调试<转>
查看>>
怎么获得combobox的valueField值
查看>>
Console-算法[if,while]-一输入两个正整数m和n,求其最大公约数和最小公倍数
查看>>
浅谈网络协议(四) IP的由来--DHCP与PXE
查看>>
jre与jdk的区别
查看>>
全景图的种类
查看>>
git 维护
查看>>
jfinal框架下使用c3P0连接池连接sql server 2008
查看>>
Jfinal Generator 不需要生成带某个前缀的表名数组的方法
查看>>
struts2中使用标签操作静态方法等
查看>>
熬夜写了一个小游戏,向SpaceX聊表敬意
查看>>