[英]r: iterating through all elements of several columns to detect a phrase
i am trying to loop through several columns in a dataframe which contain text files.我正在尝试遍历包含文本文件的数据框中的几列。
i want to check every entry of columns 7 through 16 to see if any of the text files contain a certain phrase.我想检查第 7 列到第 16 列的每个条目,以查看是否有任何文本文件包含某个短语。
each time the phrase is detected, i want to increase the count of times it appeared by 1.每次检测到短语时,我想将它出现的次数增加 1。
this seems pretty straightforward.这看起来很简单。 i think i should iterate through the columns and by the rows, but i just can't seem to figure out exactly how to do this.
我想我应该遍历列和行,但我似乎无法确切地弄清楚如何做到这一点。
any suggestions?有什么建议? thank you in advance for any insight.
提前感谢您的任何见解。
fc_count <- 0
for (col in profiles[7:16]){
for (row in 1:nrow(profiles)){
if(isTRUE(grepl("my name is jeff", row)) == TRUE){
fc_count = fc_count + 1
}
}
}
fc_count
We can use lapply
to loop over the columns 7 to 16, apply grepl
, with the pattern
to get a list
of logical vectors, Reduce
, it to a single integer vector by adding ( +
) and then get the total value by sum
我们可以使用
lapply
循环第 7 到 16 列,应用grepl
,使用pattern
来获取逻辑向量list
, Reduce
,通过添加 ( +
) 将其转换为单个整数向量,然后通过sum
获得sum
sum(Reduce(`+`, lapply(profiles[7:16], grepl, pattern = "my name is jeff")))
As grepl
is vectorized for vector
, if we convert the 'data.frame' to a matrix
( a matrix
is a vector with dim attributes), it is more compact由于
grepl
被vector
grepl
vector
,如果我们将 'data.frame' 转换为matrix
( matrix
是具有暗淡属性的向量),它会更紧凑
sum(grepl("my name is jeff", as.matrix(profiles[7:16])))
Also, with for
loops, we don't need the nested loops as grepl
is vectorized此外,对于
for
循环,我们不需要嵌套循环,因为grepl
是矢量化的
fc_count <- 0
for(prf in profiles[7:16]){
fc_count <- fc_count + sum(grepl("my name is jeff", prf))
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.