[英]How to grep a group based on string in another column that doesn't occur in each observation using R?
Have to simplify a previous question that failed. 必须简化之前失败的问题。
I want to extract whole groups, identified by 'id', that contain a string ('inter' or 'high') in another column called 'strmatch'. 我想提取由“ id”标识的整个组,它们在另一个名为“ strmatch”的列中包含一个字符串(“ inter”或“ high”)。 The string doesn't occurr in every observation of the group, but if it occurs I want to assign the group to a respective data frame. 该字符串不会出现在对该组的每次观察中,但是如果出现该字符串,我想将该组分配给相应的数据帧。
The data frame 数据框
df <- data.frame(id = c("a", "a", "b", "b","c", "c","d","d"),
std = c("y", "y","n","n","y","y","n","n"),
strmatch = c("alpha","TMB-inter","beta","TMB-high","gamma","delta","epsilon","TMB-inter"))
Looks like this 看起来像这样
id std strmatch
a y alpha
a y TMB-inter
b n beta
b n TMB-high
c y gamma
c y delta
d n epsilon
d n TMB-inter
Expected result 预期结果
dfa dfa
id std strmatch
a y alpha
a y TMB-inter
d n epsilon
d n TMB-inter
dfb dfb
id std strmatch
b n beta
b n TMB-high
dfc DFC
id std strmatch
c y gamma
c y delta
What I've tried 我尝试过的
split(df, grepl("high", df$strmatch))
Gives only two data frames, one with a row containing 'high' and the other one with the rest. 仅给出两个数据帧,一个数据行包含“高”行,另一数据帧包含其余数据。
Thanks a lot for your help. 非常感谢你的帮助。
You could maybe divide this into two parts. 您可以将其分为两部分。 First find out values which match "inter|high"
and break them into separate dataframes and then find the one which do not match any of unique_vals
. 首先找出与"inter|high"
匹配的值,并将它们分成单独的数据帧,然后找到与任何unique_vals
不匹配的unique_vals
。
unique_vals <- unique(grep("inter|high", df$strmatch, value = TRUE))
c(lapply(unique_vals, function(x) subset(df, id %in% id[strmatch == x])),
list(subset(df, !id %in% id[strmatch %in% unique_vals])))
#[[1]]
# id std strmatch
#1 a y alpha
#2 a y TMB-inter
#7 d n epsilon
#8 d n TMB-inter
#[[2]]
# id std strmatch
#3 b n beta
#4 b n TMB-high
#[[3]]
# id std strmatch
#5 c y gamma
#6 c y delta
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.