简体   繁体   中英

How can I match a line containing only a single instance of a pattern with grep?

Given a text file such as this, say phrases.txt with contents:

Hahahahahasdhfjshfjshdhfjhdf
Hahahaha!
jdsahjhshfjhfHahahaha!dhsjfhajhfjhf
Hahaha!Hahaha!
dfhjfsf
sdfjsjf Hahaha! djfhjsdfh
Ha! hdfshdfs
Ha! Ha! Ha!

What would be an appropriate grep command in bash that would output only the lines that contain only a single occurrence of laughter, where laughter is defined as a string of the form Hahahahaha! with arbitrarily many ha s. The first H is always capital and the other ones are not, and the string must end in ! . In my example, the egrep command should output:

Hahahaha!
jdsahjhshfjhfHahahaha!dhsjfhajhfjhf
sdfjsjf Hahaha! djfhjsdfh
Ha! hdfshdfs

A command I came up with was:

egrep "(Ha(ha)*\!){1}" phrases.txt

The issue with my command is that it does not only output the lines with only a single occurrence of laughter. With my command, line 4 ( Hahaha!Hahaha! ) and line 8 ( Ha! Ha! Ha! ) also get printed which is not what I want.

Is there a nice way to do this with only grep?

you are okay with pipes then

egrep '(Ha(ha)*!)' yourfile.txt | egrep -v '(Ha(ha)*!).*(Ha(ha)*!)'

first filter for at least one laugh, then filter out the ones that have more than one laugh.

Note: {1} only repeats the previous chunk, it doesn't check the rest of the string to insist that there is only one. So a{1} and a are actually the same.

If you use a GNU grep or pcregrep that support PCRE regex, you may use

grep -P '^(?!(?:.*Ha(ha)*!){2}).*Ha(ha)*!'

The pattern is:

^(?!(?:.*YOUR_PATTERN_HERE){2}).*YOUR_PATTERN_HERE

where YOUR_PATTERN_HERE stands for your pattern you want to occur only once in the string.

Details

  • ^ - start of a strig
  • (?!(?:.*YOUR_PATTERN_HERE){2}) - a negative lookahead that fails the match, immediately to the right of the current location (here, the start of string), there are two consecutive occurrences of
    • .* - any 0+ chars other than line break chars
    • YOUR_PATTERN_HERE - your required pattern
  • .* - any 0+ chars other than line break chars
  • YOUR_PATTERN_HERE - your required pattern.

See the online demo :

s="Hahahahahasdhfjshfjshdhfjhdf
Hahahaha!
jdsahjhshfjhfHahahaha!dhsjfhajhfjhf
Hahaha!Hahaha!
dfhjfsf
sdfjsjf Hahaha! djfhjsdfh
Ha! hdfshdfs
Ha! Ha! Ha!"
echo "$s" | grep -P '^(?!(?:.*Ha(ha)*!){2}).*Ha(ha)*!'

Output:

Hahahaha!
jdsahjhshfjhfHahahaha!dhsjfhajhfjhf
sdfjsjf Hahaha! djfhjsdfh
Ha! hdfshdfs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM