简体   繁体   English

使用R在语料库中搜索单词

[英]Searching for words in a corpus with R

I am trying to search for strings of words in a corpus using R. Are disjunctive statements allowed in grep, eg, grep("a" or "b" or "c"...)? 我正在尝试使用R搜索语料库中的单词字符串。grep是否允许析取语句,例如grep(“ a”或“ b”或“ c” ...)? If so, once I have that subcorpus, how do I then refine it further to contain only those examples with at least two tokens of the original condition? 如果是这样,一旦有了该子语料库,我该如何进一步对其进行完善以仅包含那些带有至少两个原始条件标记的示例?

Yes, the vertical bar | 是的,竖线| works as an or-operator in grep . grep充当or运算符。 You can look up regular expressions in R by running ?regex . 您可以通过运行?regex在R中查找正?regex

So, to give an example: 因此,举一个例子:

grep("ape|bass|cat", c("monkey", "bass", "catfish"))
[1] 2 3

Also confer the documentation of grep , grepl , and that family of functions. 还要提供grepgrepl和该功能家族的文档。 The stringr package provide additional tools for handling text. stringr包提供了其他工具来处理文本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM