[英]Finding strings in a file using regex and grep in Linux
I am trying to find the proper regexes to use with the grep command on the file text.txt
. 我正在尝试在文件text.txt
上找到与grep命令一起使用的正则表达式。
Question 题
Find all occurrences of words in text that have a substring ad, bd, cd, dd, ed. 查找带有子字符串ad,bd,cd,dd和ed的文本中出现的所有单词。
Find all occurrences of numbers > 100 查找所有出现的数字> 100
Find all occurrences of numbers > 100 that contain a digit 0 or 5 查找包含数字0或5的所有出现的数字> 100
My Approach 我的方法
grep -io '[ae]*d' text
Prints words with the proper substrings, but doesn't print the whole string/word. 打印带有适当子字符串的单词,但不打印整个字符串/单词。
ad d d ed d d ed d d d d ed d d
grep -io '[199][1-9]*' text
I believe I am way off on the regex, but it still prints the correct result. 我相信我离正则表达式还很遥远,但是它仍然可以打印正确的结果。
1973 197 17775
grep -io '[05][1-9]*' text
This is the continuation of 2., so I don't understand the 2. part in 3., but I believe I have the string containing a digit 0 or 5 correct. 这是2的延续,所以我不理解3.中的2.部分,但是我相信我的字符串中包含正确的数字0或5。
0 0 0 5
For part (a), the -o
option to grep causes it to print only the part of the line that matches the pattern, but your pattern does not match whole words. 对于(a)部分,grep的-o
选项使它仅打印与模式匹配的行部分,但您的模式与整个单词不匹配。 You simply need to adjust your pattern to match the parts of each word before and after the [ae]d
substring. 您只需要调整样式以匹配[ae]d
子字符串之前和之后的每个单词的部分。
For part (b), your pattern is all wrong. 对于(b)部分,您的模式完全错误。 It will not match the numbers 299 or 1000, for instance. 例如,它将与数字299或1000不匹配。 The digit pattern you want is a digit between 1
and 9
followed by at least two digits between 0
and 9
. 所需的数字模式是1
到9
之间的数字,然后是至少0
到9
之间的两个数字。
Part (c) is the trickiest. (c)部分最棘手。 You must match digit patterns containing at least three digits, the first being between 1
and 9
, with either a 5
in the first position or a 0
or 5
in any other position. 必须含有之间的至少三个数字,第一个是匹配数字模式1
和9
,与任一5
在第一位置或0
或5
中的任何其它位置。 You probably need to separate that into alternatives with the |
您可能需要使用|
将其分成其他选择|
operator. 运营商。 It looks like you probably need three: the case where the lead digit is 5
; 看来您可能需要三个条件:前导数字为5
; the case where the second digit is either 0
or 5
, and the case where some later digit is 0
or 5
. 第二位是0
或5
的情况,后面一位是0
或5
。 In the third case you mustn't forget that there may be any number of additional digits, including zero, on either side of the 0
or 5
you match. 在第三种情况下,您一定不要忘记,在您所匹配的0
或5
两侧可能有任意数量的附加数字,包括零。
A) Find all occurrences of words in text that have a substring ad, bd, cd, dd, ed. A)查找带有子字符串ad,bd,cd,dd和ed的文本中所有出现的单词。
grep -ow '.*\(a\|b\|c\|d\|e\)d.*' text
or 要么
egrep -ow '.*(a|b|c|d|e)d.*' text
B) Find all occurrences of numbers > 100 B)查找所有出现的数字> 100
grep -ow '[1-9][0-9][0-9]\+' text
C) Find all occurrences of numbers > 100 that contain a digit 0 or 5 C)查找所有包含数字0或5的大于100的数字
grep -ow '[1-9][0-9][0-9]\+' text | grep '\(0\|5\)'
or 要么
grep -ow '[1-9][0-9][0-9]\+' text | egrep '(0|5)'
I'm using the option -o
to output every match on it's own line and not the whole line where the pattern was found and the option -w
that specifies that before and after the match should be a word boundary. 我正在使用-o
选项在其自己的行而不是在找到模式的整行上输出每个匹配项,而-w
选项则指定在匹配之前和之后应该是单词边界。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.