How to find string before and after specified word using Grep and RegEx

Question

For Example I have a files that contains the ff set of strings

AAA1111BBB -> FILE1
AAA2222BBB -> FILE2
AAA3333BBB -> FILE3

Now in unix, I want to extract all IDs by searching the pattern of AAA as start of the line and BBB as the end of the pattern. Output will something like this

1111
2222
3333

Then I want to remove all duplicate entries then save it in a file. How will I able to do this?

Answer 1

If you have grep -P option available you can try

(?<=A{3})\d+(?=B{3})

this regex which uses lookarounds to find the digits surrounded by AAA and BBB

Answer 2

Well, your little example could be accomplished with this command:

sed -e 's/^AAA//' -e 's/BBB.*//' input.txt | sort -u > output.txt

But, my guess is that your toy example may not sufficiently explain exactly what you are trying to accomplish...

Answer 3

With GNU awk:

gawk '
    match($1, /^AAA(.*)BBB$/, m) {keys[m[1]]=1}
    END {for (k in keys) print k}
' file

or perl

perl -nE '/^AAA(\w+)BBB/ and $k{$1}=1 }END{ say join "\n", keys %k' file

Answer 4

I assume that your IDs are 4 digit numbers:

grep -oE "AAA[0-9]{4}BBB" <filename> | grep -oE "[0-9]{4}"

Edit:

If you have something like "AAA12@3BBB" :

 grep -oE "AAA.{4}BBB" <filename> | grep -oE "[0-9,@]{4}"

How to find string before and after specified word using Grep and RegEx

Question

4 answers

solution1
2 2015-01-29 14:08:42

solution2
0 2015-01-29 15:02:13

solution3
0 2015-01-29 15:39:53

solution4
0 2015-01-29 17:19:21

How to find string before and after specified word using Grep and RegEx

Question

4 answers

solution1 2 2015-01-29 14:08:42

solution2 0 2015-01-29 15:02:13

solution3 0 2015-01-29 15:39:53

solution4 0 2015-01-29 17:19:21

solution1
2 2015-01-29 14:08:42

solution2
0 2015-01-29 15:02:13

solution3
0 2015-01-29 15:39:53

solution4
0 2015-01-29 17:19:21