简体   繁体   English

如何使用R在每次观察中都不会出现的另一列中基于字符串的字符串grep?

[英]How to grep a group based on string in another column that doesn't occur in each observation using R?

Have to simplify a previous question that failed. 必须简化之前失败的问题。

I want to extract whole groups, identified by 'id', that contain a string ('inter' or 'high') in another column called 'strmatch'. 我想提取由“ id”标识的整个组,它们在另一个名为“ strmatch”的列中包含一个字符串(“ inter”或“ high”)。 The string doesn't occurr in every observation of the group, but if it occurs I want to assign the group to a respective data frame. 该字符串不会出现在对该组的每次观察中,但是如果出现该字符串,我想将该组分配给相应的数据帧。

The data frame 数据框

df <- data.frame(id = c("a", "a", "b", "b","c", "c","d","d"),
                 std = c("y", "y","n","n","y","y","n","n"),
                 strmatch = c("alpha","TMB-inter","beta","TMB-high","gamma","delta","epsilon","TMB-inter"))

Looks like this 看起来像这样

id  std strmatch
a   y   alpha
a   y   TMB-inter
b   n   beta
b   n   TMB-high
c   y   gamma
c   y   delta
d   n   epsilon
d   n   TMB-inter

Expected result 预期结果

dfa dfa

id  std strmatch
a   y   alpha
a   y   TMB-inter
d   n   epsilon
d   n   TMB-inter

dfb dfb

id  std strmatch
b   n   beta
b   n   TMB-high

dfc DFC

id  std strmatch
c   y   gamma
c   y   delta

What I've tried 我尝试过的

split(df, grepl("high", df$strmatch))

Gives only two data frames, one with a row containing 'high' and the other one with the rest. 仅给出两个数据帧,一个数据行包含“高”行,另一数据帧包含其余数据。

Thanks a lot for your help. 非常感谢你的帮助。

You could maybe divide this into two parts. 您可以将其分为两部分。 First find out values which match "inter|high" and break them into separate dataframes and then find the one which do not match any of unique_vals . 首先找出与"inter|high"匹配的值,并将它们分成单独的数据帧,然后找到与任何unique_vals不匹配的unique_vals

unique_vals <- unique(grep("inter|high", df$strmatch, value = TRUE))

c(lapply(unique_vals, function(x) subset(df, id %in% id[strmatch == x])), 
         list(subset(df, !id %in% id[strmatch %in% unique_vals])))


#[[1]]
#  id std  strmatch
#1  a   y     alpha
#2  a   y TMB-inter
#7  d   n   epsilon
#8  d   n TMB-inter

#[[2]]
#  id std strmatch
#3  b   n     beta
#4  b   n TMB-high

#[[3]]
#  id std strmatch
#5  c   y    gamma
#6  c   y    delta

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R:按列组汇总数据-使用每个观察值对列进行变异 - R: Aggregating data by column group - mutate column with values for each observation R - 对于列中的每个观察值,在另一列中找到最接近的观察值 - R - for each observation in a column, find the closest one in another column 复制观察 r 中另一列中每一行的列 - replicate observation of column for each row in another columns in r 包括 columnheader 作为 R 中每个观察的另一个列值 - include columnheader as another column value for each observation in R 如何根据字符串列为每个观察创建一个包含多行的新数据框? - How can I create a new data frame with several rows for each observation based on string column? r - 如何在每组中选择不同数量的观察 - r - how to select a different number of observation within each group 如何根据R中的分组依据和顺序将字符添加到列中每个字符串的末尾? - How to add characters to the end of each string in column based on group by and order in R? 如何知道一列中每个观测的频率并将它们按r排序? - How to know the frequency of each observation in a column and sort them in r? 如何根据R中的分组将单独的列值添加到另一列? - How to add seperate column values to another column based on group by in R? 使用R根据逻辑条件和它们发生时间的紧密程度来识别和分组相关的观察结果 - Using R to identify and group related observations based on both logical conditions and how closely in time they occur
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM