简体   繁体   English

在向量中计数单词

[英]Counting Words in vector

Currently I have enrolled in a R course and one of the practice exercises is building a R program to count words in a string. 目前我已经注册了R课程,其中一个练习练习是构建一个R程序来计算字符串中的单词。 We cannot use the function table but must return an output of the most popular word in a string using conventional means. 我们不能使用函数table但必须使用常规方法返回字符串中最常用单词的输出。 ie The fox jumped over the cone and the... So the program would have to return "the" as it is the most popular phrase. 即狐狸跳过锥体和...所以程序将不得不返回“the”,因为它是最流行的短语。

So far I have the following: 到目前为止,我有以下内容:

string_read<- function(phrase) {

  phrase <- strsplit(phrase, " ")[[1]]
  for (i in 1:length(phrase)){
    phrase.freq <- ....
#if Word already exists then increase counter by 1

      }

I've hit a road block however as I'm not sure how to increase the counter for specific words. 我已经遇到了障碍,但我不知道如何增加特定单词的计数器。 Can anyone give me a pointer in the right direction? 任何人都可以给我指向正确的方向吗? My psuedo code would be something like: "For every word that is looped through, increase wordIndex by 1. If word has already occured before, increase wordIndex counter." 我的psuedo代码将是这样的:“对于循环的每个单词,将wordIndex增加1.如果之前已经出现过单词,请增加wordIndex计数器。”

You started off correctly by splitting the string into words, then we loop over each word using sapply and sum the similar words in the vector. 你通过将字符串分成单词来正确地开始,然后我们使用sapply遍历每个单词并对向量中的相似单词求和。 I have used tolower assuming this operation is not case sensitive. 假设此操作不区分大小写,我使用了tolower

string_read<- function(phrase) {
   temp = tolower(unlist(strsplit(phase, " ")))
   which.max(sapply(temp, function(x) sum(x == temp)))
}

phrase <- "The fox jumped over the cone and the"

string_read(phrase)
#the 
#  1 

This returns output as the word and its index position which is 1 in this case. 这将返回输出作为单词及其索引位置,在这种情况下为1。 If you just want the word with maximum count , you can change the last line to 如果您只想要具有最大计数的单词,则可以将最后一行更改为

temp[which.max(sapply(temp, function(x) sum(x == temp)))]

We can do this with str_extract 我们可以用str_extract来做到这str_extract

library(stringr)
string_read<- function(str1) {
  temp <- tolower(unlist(str_extract_all(str1, "\\w+")))
  which.max(sapply(temp, function(x) sum(x == temp)))
}

phrase <- "The fox jumped over the cone and the"
string_read(phrase)
#the 
#  1 
phrase2 <- "The fox jumped over the cone and the fox, fox, fox, fox, fox"
string_read(phrase)
#fox 
# 2 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM