简体   繁体   English

r:遍历几列的所有元素以检测短语

[英]r: iterating through all elements of several columns to detect a phrase

i am trying to loop through several columns in a dataframe which contain text files.我正在尝试遍历包含文本文件的数据框中的几列。

i want to check every entry of columns 7 through 16 to see if any of the text files contain a certain phrase.我想检查第 7 列到第 16 列的每个条目,以查看是否有任何文本文件包含某个短语。

each time the phrase is detected, i want to increase the count of times it appeared by 1.每次检测到短语时,我想将它出现的次数增加 1。

this seems pretty straightforward.这看起来很简单。 i think i should iterate through the columns and by the rows, but i just can't seem to figure out exactly how to do this.我想我应该遍历列和行,但我似乎无法确切地弄清楚如何做到这一点。

any suggestions?有什么建议? thank you in advance for any insight.提前感谢您的任何见解。

fc_count <- 0

for (col in profiles[7:16]){
  for (row in 1:nrow(profiles)){

    if(isTRUE(grepl("my name is jeff", row)) == TRUE){

      fc_count = fc_count + 1

    }

  }

}

fc_count

We can use lapply to loop over the columns 7 to 16, apply grepl , with the pattern to get a list of logical vectors, Reduce , it to a single integer vector by adding ( + ) and then get the total value by sum我们可以使用lapply循环第 7 到 16 列,应用grepl ,使用pattern来获取逻辑向量listReduce ,通过添加 ( + ) 将其转换为单个整数向量,然后通过sum获得sum

sum(Reduce(`+`, lapply(profiles[7:16], grepl, pattern = "my name is jeff")))

As grepl is vectorized for vector , if we convert the 'data.frame' to a matrix ( a matrix is a vector with dim attributes), it is more compact由于greplvector grepl vector ,如果我们将 'data.frame' 转换为matrixmatrix是具有暗淡属性的向量),它会更紧凑

sum(grepl("my name is jeff", as.matrix(profiles[7:16])))

Also, with for loops, we don't need the nested loops as grepl is vectorized此外,对于for循环,我们不需要嵌套循环,因为grepl是矢量化的

fc_count <- 0
for(prf in profiles[7:16]){
    fc_count <- fc_count + sum(grepl("my name is jeff", prf))
 }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM