简体   繁体   English

使用 grep 过滤 R 中字符串中具有两个或多个模式的行

[英]Using grep to filter rows with two or more patterns in the string in R

I need to index all the rows that have a string beginning with either "B-" or "B^" in one of the columns.我需要为其中一列中包含以"B-""B^"开头的字符串的所有行建立索引。 I tried a bunch of combinations, but I am suspecting it might not be working due to "-" and "^" signs being part of grep command as well.我尝试了一堆组合,但我怀疑它可能不起作用,因为“-”和“^”标志也是 grep 命令的一部分。

dataset[grep('^(B-|B^)[^B-|B^]*$', dataset$Col1),]

With the above script, rows beginning with "B^" are not being extracted.使用上述脚本,不会提取以"B^"开头的行。 Please suggest a smart way to handle this.请建议一个聪明的方法来处理这个问题。

您可以在grep使用转义\\\\命令:

dataset[grep('^(B\\-|B\\^)[^B\\-|B\\^]*$', dataset$Col1),]

For further explanation, the ^ matches the beginning of a string as an anchor therefore you have to escape it in the middle of string.为了进一步解释, ^匹配字符串的开头作为锚点,因此您必须在字符串的中间将其转义。 The [] are a character class so [^B-|B^]* matches any character that's not a B,-,B, or ^. []是一个字符类,因此[^B-|B^]*匹配任何不是 B、-、B 或 ^ 的字符。 They are unnecessary here.它们在这里是不必要的。

The simplified regex is: dataset[grep('^(B-|B\\\\^)', dataset$Col1),]简化的正则表达式为: dataset[grep('^(B-|B\\\\^)', dataset$Col1),]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM