I have a file containing multiple lines like this:
NODE_1_length Prodigal:2.6 CDS 11 274 . + 0 ID=PROKKA_00001;inference=ab initio prediction:Prodigal:2.6;locus_tag=PROKKA_00001;product=hypothetical protein
And I want to extract the ID=PROKKA_[whatever number] and everything that comes after 'product=' to obtain an output like this:
ID=PROKKA_00001 product=hypothetical protein
I am not very skilled in using sed, so I tried to adapt some solutions I found here and around but didn't manage to get through. It is also fine if the solution comes in two step (one for the ID, one for the product), then I can merge the two results in a single file.
I would be grateful if you could include an explanation of the regex used.
So far I tried to split the problem in two (starting from the ID) and tried:
grep -o 'ID=PROKKA_[0-9]{1,5}*'
sed 's/^ID=PROKKA[0-9]*;//g/
grep -Po 'ID="K[^"]*'
but of course none of them worked. Thanks for helping!
You may use grep -oE
:
grep -oE 'ID=PROKKA_[0-9]+|product=[^;:]+' file
ID=PROKKA_00001
product=hypothetical protein
If you want result in same line then use grep + paste
:
grep -oE 'ID=PROKKA_[0-9]+|product=[^;:]+' file | paste -s
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.