[英]Subset multiple rows with condition
I have a .txt
file read into a table
called power
with over 2 million observations of 9 variables. 我有一个.txt
文件读入一个名为power
的table
其中有200万个观察值涉及9个变量。 I am trying to subset power
by two rows containing either "01/02/2007" or "02/02/2007". 我试图通过包含“ 01/02/2007”或“ 02/02/2007”的两行来对power
进行子集化。 After creating the subset, the RStudio environment said I ended up with zero observations, but the same variables. 创建子集后,RStudio环境说我最终得到零观察值,但变量相同。
How can I get a subset of the data with only rows containing "01/02/2007" and "02/02/2007"? 如何仅包含“ 01/02/2007”和“ 02/02/2007”的行来获取数据子集?
I saw a similar post, but still got an error on my dataset. 我看到了类似的帖子,但是我的数据集仍然出现错误。 See link: Select multiple rows conditioning on ID in R 请参阅链接: 在R中选择以ID为条件的多行
My data: 我的资料:
#load data
> power <- read.table("textfile.txt", stringsAsFactors = FALSE, head = TRUE)
#subsetted first column called Date
> head(power$Date)
#[1] 16/12/2006 16/12/2006 16/12/2006 16/12/2006 16/12/2006 16/12/2006
> str(power$Date)
chr [1:2075259] "16/12/2006" "16/12/2006" "16/12/2006" "16/12/2006" ...
My code: 我的代码:
> subpower <- subset(power, Date %in% c("01/02/2007", "02/02/2007"))
Subset data: 子集数据:
> str(powersub$Date)
chr(0)
I am guessing that your dataset may have trailing/leading
spaces for the column because 我猜您的数据集的列可能有trailing/leading
空格,因为
subset(power, Date %in% c("01/02/2007", "02/02/2007"))
# Date Val
#1 01/02/2007 14
#8 02/02/2007 28
If I change the rows to 如果我将行更改为
power$Date[1] <- '01/02/2007 '
power$Date[8] <- ' 02/02/2007'
subset(power, Date %in% c("01/02/2007", "02/02/2007"))
#[1] Date Val
<0 rows> (or 0-length row.names)
You could use str_trim
from stringr
您可以使用str_trim
的stringr
library(stringr)
subset(power, str_trim(Date) %in% c('01/02/2007', '02/02/2007'))
# Date Val
#1 01/02/2007 14
#8 02/02/2007 28
or use gsub
或使用gsub
subset(power, gsub("^ +| +$", "", Date) %in% c('01/02/2007', '02/02/2007'))
# Date Val
#1 01/02/2007 14
#8 02/02/2007 28
or another option without removing the spaces would be to use grep
或不删除空格的另一种选择是使用grep
subset(power, grepl('01/02/2007|02/02/2007', Date))
# Date Val
#1 01/02/2007 14
#8 02/02/2007 28
power <- structure(list(Date = c("01/02/2007", "16/12/2006", "16/12/2006",
"16/12/2006", "16/12/2006", "16/12/2006", "16/12/2006", "02/02/2007"
), Val = c(14L, 24L, 23L, 22L, 23L, 25L, 23L, 28L)), .Names = c("Date",
"Val"), class = "data.frame", row.names = c(NA, -8L))
Try: 尝试:
> subpower = power[power$Date %in% c("01/02/2007", "02/02/2007") ,]
> subpower
Date Val
1 01/02/2007 14
8 02/02/2007 28
(Using power data from @akrun's answer) (使用@akrun答案中的功率数据)
Moreover, your own code will work if you use proper name of subset: "subpower" instead of "powersub"! 此外,如果您使用适当的子集名称:“ subpower”而不是“ powersub”,则您自己的代码将起作用!
> subpower <- subset(power, Date %in% c("01/02/2007", "02/02/2007"))
> subpower
Date Val
1 01/02/2007 14
8 02/02/2007 28
>
> str(subpower)
'data.frame': 2 obs. of 2 variables:
$ Date: chr "01/02/2007" "02/02/2007"
$ Val : int 14 28
您的方法是正确的,请尝试使用
power <- read.table("textfile.txt", stringsAsFactors = FALSE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.