简体   繁体   English

交叉引用列表列表,如果匹配则附加子列表

[英]Cross-referencing a list of lists and appending the sublist if there is a match

Sorry for insufficient clarity in the title – I'm having a hard time explaining what I need. 抱歉,标题不够清晰-我很难解释我的需求。

I have a data.frame containing text, eg: 我有一个包含文本的data.frame ,例如:

text <- c("a",
          "bb",
          "c ccc",
          "fff")

text_df <- data.frame(line = 1:length(text), text = text, stringsAsFactors = FALSE)

Additionally, I have a list with text that I want to cross-reference: 此外,我还有一个列表,其中包含要交叉引用的文本:

lol <- list(c('a', 'aa', 'aaa'),
            c('d', 'dd', 'ddd'),
            c('e', 'ee', 'eee'),
            c('c', 'cc', 'ccc', 'cccc'),
            c('b', 'bb', 'bbb'),
            c('f', 'ff', 'fff'))

What I want to do is this: for every string in every row in the text_df I want to see whether there is a corresponding string in any of the sublists in lol , if match is TRUE , I want to append this sublist to the row in the text_df . 我想做的是:对于text_df中每一行中的每个字符串,我想查看lol任何子列表中是否有相应的字符串,如果match为TRUE ,我想将此子列表追加到以下行中text_df

So that the end result of this operation is this: 这样该操作的最终结果是:

>text_df_new

line          text
   1      a aa aaa
   2      b bb bbb
   3 c cc ccc cccc
   4      f ff fff

I can't really understand how to go about it. 我真的不明白该怎么做。 The pseudocode, I imagine, would look something like this: 我猜想伪代码看起来像这样:

for text in texts:
    for l in lol:
        if strsplit(text[text]) %in% lol[l]:
            text <- c(text, lol[l])

Or maybe there is a way to vectorize this? 也许有一种矢量化方法?

I think this accomplishes what you want given your data above: 我认为以上数据可以满足您的要求:

check_text <- function(df, list){
  tdf <- df
  for(i in 1:length(df$text)){
    x <- unlist(strsplit(df$text[i], split = " "))
    for(j in x){
      for(k in lol){
        for(l in k){
          if(j == l){
            tdf$text[i] <- paste(k, collapse =  " ")
          }
        }
      }
    }
  }
  return(tdf)
}


text_df_new <- check_text(text_df, lol)

> text_df_new

  line          text
1    1      a aa aaa
2    2      b bb bbb
3    3 c cc ccc cccc
4    4      f ff fff

I know that this is not a very "R" approach, so I'm guessing that a real R user will have an idea of how to do the same thing in about 2 lines using apply or one of the other functions that I still haven't really wrapped my head around yet. 我知道这不是一个非常“ R”的方法,所以我猜一个真正的R用户将对如何使用apply或我还没有使用的其他功能之一在大约两行中做同样的事情有所了解还没真正把我的头缠起来。 However, if your data set is small this is probably ok. 但是,如果您的数据集很小,那可能就可以了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM