For Example I have a files that contains the ff set of strings
AAA1111BBB -> FILE1
AAA2222BBB -> FILE2
AAA3333BBB -> FILE3
Now in unix, I want to extract all IDs by searching the pattern of AAA as start of the line and BBB as the end of the pattern. Output will something like this
1111
2222
3333
Then I want to remove all duplicate entries then save it in a file. How will I able to do this?
If you have grep -P
option available you can try
(?<=A{3})\d+(?=B{3})
this regex which uses lookarounds to find the digits surrounded by AAA
and BBB
Well, your little example could be accomplished with this command:
sed -e 's/^AAA//' -e 's/BBB.*//' input.txt | sort -u > output.txt
But, my guess is that your toy example may not sufficiently explain exactly what you are trying to accomplish...
With GNU awk:
gawk '
match($1, /^AAA(.*)BBB$/, m) {keys[m[1]]=1}
END {for (k in keys) print k}
' file
or perl
perl -nE '/^AAA(\w+)BBB/ and $k{$1}=1 }END{ say join "\n", keys %k' file
I assume that your IDs are 4 digit numbers:
grep -oE "AAA[0-9]{4}BBB" <filename> | grep -oE "[0-9]{4}"
Edit:
If you have something like "AAA12@3BBB" :
grep -oE "AAA.{4}BBB" <filename> | grep -oE "[0-9,@]{4}"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.