[英]Subset R data frame based on string matches in two columns
I have a data frame with three columns and thousands of rows. 我有一个包含三列和数千行的数据框。 The first two columns (x and y) contain character strings, and the third (z) contains numeric data.
前两列(x和y)包含字符串,第三列(z)包含数字数据。 I need to subset the data frame based on matching values in both of the first two columns.
我需要根据前两列中的匹配值对数据帧进行子集化。
x <- c("a", "b", "c", "d", "f", "g", "h", "i", "j", "k")
y <- c("h", "b", "k", "a", "g", "d", "i", "c", "f", "j")
z <- c(1:10)
df <- data.frame(x, y, z)
x y z
1 a h 1
2 b b 2
3 c k 3
4 d a 4
5 f g 5
6 g d 6
7 h i 7
8 i c 8
9 j f 9
10 k j 10
Say this is my table, and the values I am interested in are "a", "c", "f", "h" and "k". 说这是我的桌子,我感兴趣的值是“a”,“c”,“f”,“h”和“k”。 I only want to return the rows in which both x and y contain one of the five, so in this case rows 1 and 3.
我只想返回x和y都包含五个中的一个的行,所以在这种情况下是行1和3。
I've tried: 我试过了:
df2 <- filter(df,
x == ("a" | "c" | "f" | "h" | "k") &
y == ("a" | "c" | "f" | "h" | "k"))
but this doesn't work for factors or character strings. 但这不适用于因子或字符串。 Is there an equivalent or another way around this?
是否有相同或其他方式?
Thanks in advance. 提前致谢。
I think this returns what you are looking for: 我认为这会返回你想要的东西:
# build vector of necessary elements
mustHaves <- c("a", "c", "f", "h", "k")
# perform subsetting
df[with(df, x %in% mustHaves & y %in% mustHaves),]
x y z
1 a h 1
3 c k 3
data 数据
df <- data.frame(x, y, z, stringsAsFactors = FALSE)
With dplyr
用
dplyr
df2 <- filter(df,
x %in% c("a" ,"c","f" ,"h","k") &
y %in% c("a" ,"c","f" ,"h","k"))
df2
x y z
1 a h 1
2 c k 3
What about: 关于什么:
df2 <- filter(df, grepl("[acfhk]",x) & grepl("[acfhk]",y))
using dplyr
package 使用
dplyr
包
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.