简体   繁体   中英

r: iterating through all elements of several columns to detect a phrase

i am trying to loop through several columns in a dataframe which contain text files.

i want to check every entry of columns 7 through 16 to see if any of the text files contain a certain phrase.

each time the phrase is detected, i want to increase the count of times it appeared by 1.

this seems pretty straightforward. i think i should iterate through the columns and by the rows, but i just can't seem to figure out exactly how to do this.

any suggestions? thank you in advance for any insight.

fc_count <- 0

for (col in profiles[7:16]){
  for (row in 1:nrow(profiles)){

    if(isTRUE(grepl("my name is jeff", row)) == TRUE){

      fc_count = fc_count + 1

    }

  }

}

fc_count

We can use lapply to loop over the columns 7 to 16, apply grepl , with the pattern to get a list of logical vectors, Reduce , it to a single integer vector by adding ( + ) and then get the total value by sum

sum(Reduce(`+`, lapply(profiles[7:16], grepl, pattern = "my name is jeff")))

As grepl is vectorized for vector , if we convert the 'data.frame' to a matrix ( a matrix is a vector with dim attributes), it is more compact

sum(grepl("my name is jeff", as.matrix(profiles[7:16])))

Also, with for loops, we don't need the nested loops as grepl is vectorized

fc_count <- 0
for(prf in profiles[7:16]){
    fc_count <- fc_count + sum(grepl("my name is jeff", prf))
 }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM