简体   繁体   English

如何将仅包含模式的单个实例的行与 grep 匹配?

[英]How can I match a line containing only a single instance of a pattern with grep?

Given a text file such as this, say phrases.txt with contents:给定一个像这样的文本文件,说phrases.txt的内容:

Hahahahahasdhfjshfjshdhfjhdf
Hahahaha!
jdsahjhshfjhfHahahaha!dhsjfhajhfjhf
Hahaha!Hahaha!
dfhjfsf
sdfjsjf Hahaha! djfhjsdfh
Ha! hdfshdfs
Ha! Ha! Ha!

What would be an appropriate grep command in bash that would output only the lines that contain only a single occurrence of laughter, where laughter is defined as a string of the form Hahahahaha!什么是 bash 中合适的grep命令,它只输出仅包含一次大笑的行,其中大笑被定义为Hahahahaha!形式的字符串Hahahahaha! with arbitrarily many ha s.有任意多的ha The first H is always capital and the other ones are not, and the string must end in !第一个H总是大写,其他的不是,并且字符串必须以! . . In my example, the egrep command should output:在我的示例中, egrep 命令应该输出:

Hahahaha!
jdsahjhshfjhfHahahaha!dhsjfhajhfjhf
sdfjsjf Hahaha! djfhjsdfh
Ha! hdfshdfs

A command I came up with was:我想出的命令是:

egrep "(Ha(ha)*\!){1}" phrases.txt

The issue with my command is that it does not only output the lines with only a single occurrence of laughter.我的命令的问题在于它不仅输出只有一次笑声的行。 With my command, line 4 ( Hahaha!Hahaha! ) and line 8 ( Ha! Ha! Ha! ) also get printed which is not what I want.使用我的命令,第 4 行( Hahaha!Hahaha! )和第 8 行( Ha! Ha! Ha! )也被打印出来,这不是我想要的。

Is there a nice way to do this with only grep?有没有一种只用 grep 来做到这一点的好方法?

you are okay with pipes then那么你对管道没问题

egrep '(Ha(ha)*!)' yourfile.txt | egrep -v '(Ha(ha)*!).*(Ha(ha)*!)'

first filter for at least one laugh, then filter out the ones that have more than one laugh.首先过滤至少一个笑声,然后过滤掉不止一个笑声的那些。

Note: {1} only repeats the previous chunk, it doesn't check the rest of the string to insist that there is only one.注意: {1}只重复前一个块,它不会检查字符串的其余部分以坚持只有一个。 So a{1} and a are actually the same.所以a{1}a实际上是一样的。

If you use a GNU grep or pcregrep that support PCRE regex, you may use如果您使用支持 PCRE 正则表达式的 GNU greppcregrep ,您可以使用

grep -P '^(?!(?:.*Ha(ha)*!){2}).*Ha(ha)*!'

The pattern is:图案是:

^(?!(?:.*YOUR_PATTERN_HERE){2}).*YOUR_PATTERN_HERE

where YOUR_PATTERN_HERE stands for your pattern you want to occur only once in the string.其中YOUR_PATTERN_HERE代表您希望在字符串中只出现一次的模式。

Details细节

  • ^ - start of a strig ^ - 字符串的开始
  • (?!(?:.*YOUR_PATTERN_HERE){2}) - a negative lookahead that fails the match, immediately to the right of the current location (here, the start of string), there are two consecutive occurrences of (?!(?:.*YOUR_PATTERN_HERE){2}) - 匹配失败的负向前瞻,紧靠当前位置(这里是字符串的开头)的右侧,连续出现两次
    • .* - any 0+ chars other than line break chars .* - 除换行符以外的任何 0+ 个字符
    • YOUR_PATTERN_HERE - your required pattern YOUR_PATTERN_HERE - 您需要的模式
  • .* - any 0+ chars other than line break chars .* - 除换行符以外的任何 0+ 个字符
  • YOUR_PATTERN_HERE - your required pattern. YOUR_PATTERN_HERE - 您所需的模式。

See the online demo :请参阅在线演示

s="Hahahahahasdhfjshfjshdhfjhdf
Hahahaha!
jdsahjhshfjhfHahahaha!dhsjfhajhfjhf
Hahaha!Hahaha!
dfhjfsf
sdfjsjf Hahaha! djfhjsdfh
Ha! hdfshdfs
Ha! Ha! Ha!"
echo "$s" | grep -P '^(?!(?:.*Ha(ha)*!){2}).*Ha(ha)*!'

Output:输出:

Hahahaha!
jdsahjhshfjhfHahahaha!dhsjfhajhfjhf
sdfjsjf Hahaha! djfhjsdfh
Ha! hdfshdfs

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM