如何使用Grep和RegEx在指定单词之前和之后查找字符串

Question

For Example I have a files that contains the ff set of strings 例如，我有一个包含ff组字符串的文件

AAA1111BBB -> FILE1
AAA2222BBB -> FILE2
AAA3333BBB -> FILE3

Now in unix, I want to extract all IDs by searching the pattern of AAA as start of the line and BBB as the end of the pattern. 现在，在Unix中，我想通过搜索AAA模式作为行的开头并搜索BBB作为模式的结尾来提取所有ID。 Output will something like this 输出将是这样的

1111
2222
3333

Then I want to remove all duplicate entries then save it in a file. 然后，我要删除所有重复的条目，然后将其保存在文件中。 How will I able to do this? 我该怎么做？

Answer 1

If you have grep -P option available you can try 如果您有grep -P选项可用，可以尝试

(?<=A{3})\d+(?=B{3})

this regex which uses lookarounds to find the digits surrounded by AAA and BBB 此正则表达式使用环顾四周来查找由AAA和BBB包围的数字

Answer 2

Well, your little example could be accomplished with this command: 好吧，您的小示例可以通过以下命令完成：

sed -e 's/^AAA//' -e 's/BBB.*//' input.txt | sort -u > output.txt

But, my guess is that your toy example may not sufficiently explain exactly what you are trying to accomplish... 但是，我的猜测是，您的玩具示例可能无法充分说明您要完成的工作...

Answer 3

With GNU awk: 使用GNU awk：

gawk '
    match($1, /^AAA(.*)BBB$/, m) {keys[m[1]]=1}
    END {for (k in keys) print k}
' file

or perl 或perl

perl -nE '/^AAA(\w+)BBB/ and $k{$1}=1 }END{ say join "\n", keys %k' file

Answer 4

I assume that your IDs are 4 digit numbers: 我假设您的ID是4位数字：

grep -oE "AAA[0-9]{4}BBB" <filename> | grep -oE "[0-9]{4}"

Edit: 编辑：

If you have something like "AAA12@3BBB" : 如果您有类似“ AAA12 @ 3BBB”的字样：

 grep -oE "AAA.{4}BBB" <filename> | grep -oE "[0-9,@]{4}"

如何使用Grep和RegEx在指定单词之前和之后查找字符串

问题描述

4 个解决方案

解决方案1
2 2015-01-29 14:08:42

解决方案2
0 2015-01-29 15:02:13

解决方案3
0 2015-01-29 15:39:53

解决方案4
0 2015-01-29 17:19:21

如何使用Grep和RegEx在指定单词之前和之后查找字符串

问题描述

4 个解决方案

解决方案1 2 2015-01-29 14:08:42

解决方案2 0 2015-01-29 15:02:13

解决方案3 0 2015-01-29 15:39:53

解决方案4 0 2015-01-29 17:19:21

解决方案1
2 2015-01-29 14:08:42

解决方案2
0 2015-01-29 15:02:13

解决方案3
0 2015-01-29 15:39:53

解决方案4
0 2015-01-29 17:19:21