R - 从 dataframe 中排除不包含某些值的行

Question

I have a huge dataframe from which I need to remove rows that don't contain any values present in a vector (vector name "codes").我有一个巨大的 dataframe ，我需要从中删除不包含向量中存在的任何值的行（向量名称“代码”）。

Example dataframe:示例 dataframe：

df <- data.frame(ID = as.integer(c(10001, 10002, 10004, 10005, 10006)), 
                 X1 = as.integer(c(150, 120, 175, 160, 1)),
                 X2 = as.integer(c(1, 1412415, 16420, 19920, 150)))
> df
     ID  X1      X2
1 10001 150       1
2 10002 120 1412415
3 10003 175   16420
4 10004 160   19920
5 10005   1     150

codes <- c(120, 150)
codes <- as.integer(codes)

I have tried multiple options, here's one failed example:我尝试了多种选择，这是一个失败的例子：

newdf <- df[do.call(paste, df[2:3]) %in% codes,]

> newdf
[1] ID X1 X2
<0 rows> (or 0-length row.names)

Instead, newdf should contain rows 1, 2 and 5 with ID numbers 10001, 10002 and 10005 as such:相反，newdf 应该包含 ID 号为 10001、10002 和 10005 的第 1、2 和 5 行，如下所示：

> newdf
     ID  X1      X2
1 10001 150       1
2 10002 120 1412415
5 10005   1     150

Answer 1

This way is nice, as it is scaleable to any number of columns that begin with X.这种方式很好，因为它可以扩展到以 X 开头的任意数量的列。

dplyr::filter_at(df, vars(starts_with("X")), any_vars(. %in% codes))

Answer 2

dplyr::filter() does what you want. dplyr::filter()做你想做的事。 It takes in any condition (like is the value part of vector X) and removes all rows not matching the conditions:它接受任何条件（例如向量 X 的值部分）并删除所有不匹配条件的行：

library(dplyr)

df %>%
filter(X1 %in% codes | X2 %in% codes)

R - 从 dataframe 中排除不包含某些值的行

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-04-24 11:48:24

解决方案2
0 2020-04-24 11:45:54

R - 从 dataframe 中排除不包含某些值的行

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-04-24 11:48:24

解决方案2 0 2020-04-24 11:45:54

解决方案1
2 已采纳 2020-04-24 11:48:24

解决方案2
0 2020-04-24 11:45:54