使用 R 从向量中提取指定的单词

Question

I have a text eg我有一个文本，例如

text<- "i am happy today :):)"

I want to extract:) from text vector and report its frequency我想从文本向量中提取 :) 并报告它的频率

Answer 1

Here's one idea, which would be easy to generalize:这是一个很容易概括的想法：

text<- c("i was happy yesterday :):)",
         "i am happy today :)",
         "will i be happy tomorrow?")

(nchar(text) - nchar(gsub(":)", "", text))) / 2
# [1] 2 1 0

Answer 2

I assume you only want the count, or do you also want to remove :) from the string?我假设您只想要计数，还是还想从字符串中删除:) ？

For the count you can do:对于计数，您可以执行以下操作：

length(gregexpr(":)",text)[[1]])

which gives 2. A more generalized solution for a vector of strings is:给出 2. 字符串向量的更通用的解决方案是：

sapply(gregexpr(":)",text),length)

Edit:编辑：

Josh O'Brien pointed out that this also returns 1 of there is no :) since gregexpr returns -1 in that case. Josh O'Brien 指出这也返回 1 of there no :)因为gregexpr在这种情况下返回-1 。 To fix this you can use:要解决此问题，您可以使用：

sapply(gregexpr(":)",text),function(x)sum(x>0))

Which does become slightly less pretty.这确实变得不那么漂亮了。

Answer 3

This does the trick but might not be the most direct way:这可以解决问题，但可能不是最直接的方法：

mytext<- "i am happy today :):)"

# The following line inserts semicolons to split on
myTextSub<-gsub(":)", ";:);", mytext)

# Then split and unlist
myTextSplit <- unlist(strsplit(myTextSub, ";"))

# Then see how many times the smiley turns up
length(grep(":)", myTextSplit))

EDIT编辑

To handle vectors of text with length > 1, don't unlist:要处理长度 > 1 的文本向量，请不要取消列出：

mytext<- rep("i am happy today :):)",2)
myTextSub<-gsub(":\\)", ";:\\);", mytext)
myTextSplit <- strsplit(myTextSub, ";")

sapply(myTextSplit,function(x){
  length(grep(":)", x))
})

But I like the other answers better.但我更喜欢其他答案。

使用 R 从向量中提取指定的单词

问题描述

3 个解决方案

解决方案1
5 已采纳 2012-04-11 07:44:45

解决方案2
3 2012-04-11 07:50:58

Edit:编辑：

解决方案3
1 2012-04-11 07:43:03

使用 R 从向量中提取指定的单词

问题描述

3 个解决方案

解决方案1 5 已采纳 2012-04-11 07:44:45

解决方案2 3 2012-04-11 07:50:58

Edit:编辑：

解决方案3 1 2012-04-11 07:43:03

解决方案1
5 已采纳 2012-04-11 07:44:45

解决方案2
3 2012-04-11 07:50:58

解决方案3
1 2012-04-11 07:43:03