[英]Searching for words in a corpus with R
I am trying to search for strings of words in a corpus using R. Are disjunctive statements allowed in grep, eg, grep("a" or "b" or "c"...)? 我正在尝试使用R搜索语料库中的单词字符串。grep是否允许析取语句,例如grep(“ a”或“ b”或“ c” ...)? If so, once I have that subcorpus, how do I then refine it further to contain only those examples with at least two tokens of the original condition?
如果是这样,一旦有了该子语料库,我该如何进一步对其进行完善以仅包含那些带有至少两个原始条件标记的示例?
Yes, the vertical bar |
是的,竖线
|
works as an or-operator in grep
. 在
grep
充当or运算符。 You can look up regular expressions in R by running ?regex
. 您可以通过运行
?regex
在R中查找正?regex
。
So, to give an example: 因此,举一个例子:
grep("ape|bass|cat", c("monkey", "bass", "catfish"))
[1] 2 3
Also confer the documentation of grep
, grepl
, and that family of functions. 还要提供
grep
, grepl
和该功能家族的文档。 The stringr
package provide additional tools for handling text. stringr
包提供了其他工具来处理文本。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.