简体   繁体   中英

How to find string before and after specified word using Grep and RegEx

For Example I have a files that contains the ff set of strings

AAA1111BBB -> FILE1
AAA2222BBB -> FILE2
AAA3333BBB -> FILE3

Now in unix, I want to extract all IDs by searching the pattern of AAA as start of the line and BBB as the end of the pattern. Output will something like this

1111
2222
3333

Then I want to remove all duplicate entries then save it in a file. How will I able to do this?

If you have grep -P option available you can try

(?<=A{3})\d+(?=B{3})

this regex which uses lookarounds to find the digits surrounded by AAA and BBB

Well, your little example could be accomplished with this command:

sed -e 's/^AAA//' -e 's/BBB.*//' input.txt | sort -u > output.txt

But, my guess is that your toy example may not sufficiently explain exactly what you are trying to accomplish...

With GNU awk:

gawk '
    match($1, /^AAA(.*)BBB$/, m) {keys[m[1]]=1}
    END {for (k in keys) print k}
' file

or perl

perl -nE '/^AAA(\w+)BBB/ and $k{$1}=1 }END{ say join "\n", keys %k' file

I assume that your IDs are 4 digit numbers:

grep -oE "AAA[0-9]{4}BBB" <filename> | grep -oE "[0-9]{4}"

Edit:

If you have something like "AAA12@3BBB" :

 grep -oE "AAA.{4}BBB" <filename> | grep -oE "[0-9,@]{4}"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM