[英]Removing rows from a data frame
I have this data.frame
: 我有这个data.frame
:
set.seed(1)
df <- data.frame(id1=LETTERS[sample(26,100,replace = T)],id2=LETTERS[sample(26,100,replace = T)],stringsAsFactors = F)
and this vector
: 和这个vector
:
vec <- LETTERS[sample(26,10,replace = F)]
I want to remove from df
any row which either df$id1
or df$id2
are not in vec
我想从df
删除df$id1
或df$id2
不在vec
任何行
Is there any faster way of finding the row indices which meet this condition than this: 是否有比此条件更快的找到满足此条件的行索引的方法:
rm.idx <- which(!apply(df,1,function(x) all(x %in% vec)))
我用dplyr
这样的脚本
df1 <- df %>% filter(!(df$id1 %in% vec)|!(df$id2 %in% vec))
Looping over the columns might be faster than over rows. 在列上循环可能比在行上循环更快。 So, use lapply
to loop over the columns, create a list
of logical vector
s with %in%
, use Reduce
with |
因此,使用lapply
遍历各列,使用%in%
创建一个逻辑vector
s list
,使用Reduce
with |
to check whether there are any TRUE values for each corresponding row and use that to subset the 'df' 检查每个对应的行是否有TRUE值,并使用它来对'df'进行子集化
df[Reduce(`|`, lapply(df, `%in%`, vec)),]
If we need both elements, then replace |
如果我们需要两个元素,则替换|
with &
与&
df[Reduce(`&`, lapply(df, `%in%`, vec)),]
Actually 其实
rm.idx <- unique(which(!(df$id1 %in% vec) | !(df$id2 %in% vec)))
is also fast. 也很快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.