[英]Extracting specified word from a vector using R
I have a text eg我有一个文本,例如
text<- "i am happy today :):)"
I want to extract:) from text vector and report its frequency我想从文本向量中提取 :) 并报告它的频率
Here's one idea, which would be easy to generalize:这是一个很容易概括的想法:
text<- c("i was happy yesterday :):)",
"i am happy today :)",
"will i be happy tomorrow?")
(nchar(text) - nchar(gsub(":)", "", text))) / 2
# [1] 2 1 0
I assume you only want the count, or do you also want to remove :)
from the string?我假设您只想要计数,还是还想从字符串中删除:)
?
For the count you can do:对于计数,您可以执行以下操作:
length(gregexpr(":)",text)[[1]])
which gives 2. A more generalized solution for a vector of strings is:给出 2. 字符串向量的更通用的解决方案是:
sapply(gregexpr(":)",text),length)
Josh O'Brien pointed out that this also returns 1 of there is no :)
since gregexpr
returns -1
in that case. Josh O'Brien 指出这也返回 1 of there no :)
因为gregexpr
在这种情况下返回-1
。 To fix this you can use:要解决此问题,您可以使用:
sapply(gregexpr(":)",text),function(x)sum(x>0))
Which does become slightly less pretty.这确实变得不那么漂亮了。
This does the trick but might not be the most direct way:这可以解决问题,但可能不是最直接的方法:
mytext<- "i am happy today :):)"
# The following line inserts semicolons to split on
myTextSub<-gsub(":)", ";:);", mytext)
# Then split and unlist
myTextSplit <- unlist(strsplit(myTextSub, ";"))
# Then see how many times the smiley turns up
length(grep(":)", myTextSplit))
EDIT编辑
To handle vectors of text with length > 1, don't unlist:要处理长度 > 1 的文本向量,请不要取消列出:
mytext<- rep("i am happy today :):)",2)
myTextSub<-gsub(":\\)", ";:\\);", mytext)
myTextSplit <- strsplit(myTextSub, ";")
sapply(myTextSplit,function(x){
length(grep(":)", x))
})
But I like the other answers better.但我更喜欢其他答案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.