[英]How to find string before and after specified word using Grep and RegEx
For Example I have a files that contains the ff set of strings 例如,我有一个包含ff组字符串的文件
AAA1111BBB -> FILE1
AAA2222BBB -> FILE2
AAA3333BBB -> FILE3
Now in unix, I want to extract all IDs by searching the pattern of AAA as start of the line and BBB as the end of the pattern. 现在,在Unix中,我想通过搜索AAA模式作为行的开头并搜索BBB作为模式的结尾来提取所有ID。 Output will something like this 输出将是这样的
1111
2222
3333
Then I want to remove all duplicate entries then save it in a file. 然后,我要删除所有重复的条目,然后将其保存在文件中。 How will I able to do this? 我该怎么做?
If you have grep -P
option available you can try 如果您有grep -P
选项可用,可以尝试
(?<=A{3})\d+(?=B{3})
this regex which uses lookarounds to find the digits surrounded by AAA
and BBB
此正则表达式使用环顾四周来查找由AAA
和BBB
包围的数字
Well, your little example could be accomplished with this command: 好吧,您的小示例可以通过以下命令完成:
sed -e 's/^AAA//' -e 's/BBB.*//' input.txt | sort -u > output.txt
But, my guess is that your toy example may not sufficiently explain exactly what you are trying to accomplish... 但是,我的猜测是,您的玩具示例可能无法充分说明您要完成的工作...
With GNU awk: 使用GNU awk:
gawk '
match($1, /^AAA(.*)BBB$/, m) {keys[m[1]]=1}
END {for (k in keys) print k}
' file
or perl 或perl
perl -nE '/^AAA(\w+)BBB/ and $k{$1}=1 }END{ say join "\n", keys %k' file
I assume that your IDs are 4 digit numbers: 我假设您的ID是4位数字:
grep -oE "AAA[0-9]{4}BBB" <filename> | grep -oE "[0-9]{4}"
Edit: 编辑:
If you have something like "AAA12@3BBB" : 如果您有类似“ AAA12 @ 3BBB”的字样:
grep -oE "AAA.{4}BBB" <filename> | grep -oE "[0-9,@]{4}"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.