简体   繁体   English

在Linux中使用regex和grep在文件中查找字符串

[英]Finding strings in a file using regex and grep in Linux

I am trying to find the proper regexes to use with the grep command on the file text.txt . 我正在尝试在文件text.txt上找到与grep命令一起使用的正则表达式。

Question

  1. Find all occurrences of words in text that have a substring ad, bd, cd, dd, ed. 查找带有子字符串ad,bd,cd,dd和ed的文本中出现的所有单词。

  2. Find all occurrences of numbers > 100 查找所有出现的数字> 100

  3. Find all occurrences of numbers > 100 that contain a digit 0 or 5 查找包含数字0或5的所有出现的数字> 100

My Approach 我的方法

  1. grep -io '[ae]*d' text

    Prints words with the proper substrings, but doesn't print the whole string/word. 打印带有适当子字符串的单词,但不打印整个字符串/单词。

     ad d d ed d d ed d d d d ed d d 
  2. grep -io '[199][1-9]*' text

    I believe I am way off on the regex, but it still prints the correct result. 我相信我离正则表达式还很遥远,但是它仍然可以打印正确的结果。

     1973 197 17775 
  3. grep -io '[05][1-9]*' text

    This is the continuation of 2., so I don't understand the 2. part in 3., but I believe I have the string containing a digit 0 or 5 correct. 这是2的延续,所以我不理解3.中的2.部分,但是我相信我的字符串中包含正确的数字0或5。

     0 0 0 5 

For part (a), the -o option to grep causes it to print only the part of the line that matches the pattern, but your pattern does not match whole words. 对于(a)部分,grep的-o选项使它打印与模式匹配的行部分,但您的模式与整个单词不匹配。 You simply need to adjust your pattern to match the parts of each word before and after the [ae]d substring. 您只需要调整样式以匹配[ae]d子字符串之前和之后的每个单词的部分。

For part (b), your pattern is all wrong. 对于(b)部分,您的模式完全错误。 It will not match the numbers 299 or 1000, for instance. 例如,它将与数字299或1000不匹配。 The digit pattern you want is a digit between 1 and 9 followed by at least two digits between 0 and 9 . 所需的数字模式是19之间的数字,然后是至少09之间的两个数字。

Part (c) is the trickiest. (c)部分最棘手。 You must match digit patterns containing at least three digits, the first being between 1 and 9 , with either a 5 in the first position or a 0 or 5 in any other position. 必须含有之间的至少三个数字,第一个是匹配数字模式19 ,与任一5在第一位置或05中的任何其它位置。 You probably need to separate that into alternatives with the | 您可能需要使用|将其分成其他选择| operator. 运营商。 It looks like you probably need three: the case where the lead digit is 5 ; 看来您可能需要三个条件:前导数字为5 the case where the second digit is either 0 or 5 , and the case where some later digit is 0 or 5 . 第二位是05的情况,后面一位是05 In the third case you mustn't forget that there may be any number of additional digits, including zero, on either side of the 0 or 5 you match. 在第三种情况下,您一定不要忘记,在您所匹配的05两侧可能有任意数量的附加数字,包括零。

A) Find all occurrences of words in text that have a substring ad, bd, cd, dd, ed. A)查找带有子字符串ad,bd,cd,dd和ed的文本中所有出现的单词。

grep -ow '.*\(a\|b\|c\|d\|e\)d.*' text

or 要么

egrep -ow '.*(a|b|c|d|e)d.*' text

B) Find all occurrences of numbers > 100 B)查找所有出现的数字> 100

grep -ow '[1-9][0-9][0-9]\+' text

C) Find all occurrences of numbers > 100 that contain a digit 0 or 5 C)查找所有包含数字0或5的大于100的数字

grep -ow '[1-9][0-9][0-9]\+' text | grep '\(0\|5\)'

or 要么

grep -ow '[1-9][0-9][0-9]\+' text | egrep '(0|5)'

I'm using the option -o to output every match on it's own line and not the whole line where the pattern was found and the option -w that specifies that before and after the match should be a word boundary. 我正在使用-o选项在其自己的行而不是在找到模式的整行上输出每个匹配项,而-w选项则指定在匹配之前和之后应该是单词边界。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM