[英]R - Exclude rows from dataframe that don't contain certain values
I have a huge dataframe from which I need to remove rows that don't contain any values present in a vector (vector name "codes").我有一个巨大的 dataframe ,我需要从中删除不包含向量中存在的任何值的行(向量名称“代码”)。
Example dataframe:示例 dataframe:
df <- data.frame(ID = as.integer(c(10001, 10002, 10004, 10005, 10006)),
X1 = as.integer(c(150, 120, 175, 160, 1)),
X2 = as.integer(c(1, 1412415, 16420, 19920, 150)))
> df
ID X1 X2
1 10001 150 1
2 10002 120 1412415
3 10003 175 16420
4 10004 160 19920
5 10005 1 150
codes <- c(120, 150)
codes <- as.integer(codes)
I have tried multiple options, here's one failed example:我尝试了多种选择,这是一个失败的例子:
newdf <- df[do.call(paste, df[2:3]) %in% codes,]
> newdf
[1] ID X1 X2
<0 rows> (or 0-length row.names)
Instead, newdf should contain rows 1, 2 and 5 with ID numbers 10001, 10002 and 10005 as such:相反,newdf 应该包含 ID 号为 10001、10002 和 10005 的第 1、2 和 5 行,如下所示:
> newdf
ID X1 X2
1 10001 150 1
2 10002 120 1412415
5 10005 1 150
This way is nice, as it is scaleable to any number of columns that begin with X.这种方式很好,因为它可以扩展到以 X 开头的任意数量的列。
dplyr::filter_at(df, vars(starts_with("X")), any_vars(. %in% codes))
dplyr::filter()
does what you want. dplyr::filter()
做你想做的事。 It takes in any condition (like is the value part of vector X) and removes all rows not matching the conditions:它接受任何条件(例如向量 X 的值部分)并删除所有不匹配条件的行:
library(dplyr)
df %>%
filter(X1 %in% codes | X2 %in% codes)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.