How to find string before and after specified word using Grep and RegEx

For Example I have a files that contains the ff set of strings


Now in unix, I want to extract all IDs by searching the pattern of AAA as start of the line and BBB as the end of the pattern. Output will something like this


Then I want to remove all duplicate entries then save it in a file. How will I able to do this?

If you have grep -P option available you can try


this regex which uses lookarounds to find the digits surrounded by AAA and BBB

Well, your little example could be accomplished with this command:

sed -e 's/^AAA//' -e 's/BBB.*//' input.txt | sort -u > output.txt

But, my guess is that your toy example may not sufficiently explain exactly what you are trying to accomplish...

With GNU awk:

gawk '
    match($1, /^AAA(.*)BBB$/, m) {keys[m[1]]=1}
    END {for (k in keys) print k}
' file

or perl

perl -nE '/^AAA(\w+)BBB/ and $k{$1}=1 }END{ say join "\n", keys %k' file

I assume that your IDs are 4 digit numbers:

grep -oE "AAA[0-9]{4}BBB" <filename> | grep -oE "[0-9]{4}"


If you have something like "AAA12@3BBB" :

 grep -oE "AAA.{4}BBB" <filename> | grep -oE "[0-9,@]{4}"

