简体   繁体   English

从.gff文件中使用sed / awk / grep提取子字符串

[英]Extract substring with sed/awk/grep from .gff file

I have a file containing multiple lines like this: 我有一个包含多行的文件,如下所示:

NODE_1_length   Prodigal:2.6    CDS     11      274     .       +       0       ID=PROKKA_00001;inference=ab initio prediction:Prodigal:2.6;locus_tag=PROKKA_00001;product=hypothetical protein

And I want to extract the ID=PROKKA_[whatever number] and everything that comes after 'product=' to obtain an output like this: 我想提取ID = PROKKA_ [任何数字]和'product ='之后的所有内容,以获得如下输出:

ID=PROKKA_00001 product=hypothetical protein

I am not very skilled in using sed, so I tried to adapt some solutions I found here and around but didn't manage to get through. 我在使用sed方面不是很熟练,因此我尝试调整一些在这里和周围找到的解决方案,但没有成功。 It is also fine if the solution comes in two step (one for the ID, one for the product), then I can merge the two results in a single file. 如果解决方案分两个步骤(一个用于ID,一个用于产品),也可以,那么我可以将两个结果合并到一个文件中。

I would be grateful if you could include an explanation of the regex used. 如果您能说明所用的正则表达式,将不胜感激。

So far I tried to split the problem in two (starting from the ID) and tried: 到目前为止,我尝试将问题一分为二(从ID出发)并尝试:

grep -o 'ID=PROKKA_[0-9]{1,5}*'
sed 's/^ID=PROKKA[0-9]*;//g/
grep -Po 'ID="K[^"]*'

but of course none of them worked. 但当然他们都不起作用。 Thanks for helping! 感谢您的帮助!

You may use grep -oE : 您可以使用grep -oE

grep -oE 'ID=PROKKA_[0-9]+|product=[^;:]+' file

ID=PROKKA_00001
product=hypothetical protein

If you want result in same line then use grep + paste : 如果要在同一行中显示结果,请使用grep + paste

grep -oE 'ID=PROKKA_[0-9]+|product=[^;:]+' file | paste -s

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Sed / awk / grep或任何其他工具在Linux环境中提取子字符串 - Extract substring in Linux environemnt using Sed/awk/grep or any other tool 使用 sed 或 awk 或 Z4A037FBAC753C858472C616F6ECD 从下面的文件中以“nameofthefile:owner:permissions(numeric):size(in MB)”格式从文件中提取信息 - Extract information from file in format " nameofthefile:owner:permissions(numeric):size(in MB) " from the file below using sed or awk or grep BASH:grep / awk / sed提取可变数据 - BASH: grep/awk/sed to extract variable data awk、sed、grep ZEDC9F0A5A5D5474397BF68E3783 中的文件中的特定字符串 - awk, sed, grep specific strings from a file in Linux 使用grep / sed / awk提取与特定字段对应的字符串 - use grep/sed/awk to extract string corresponding to certain field 提取域然后使用sed / awk / grep / perl粘贴到同一行 - Extract domain then paste into the same line using sed/awk/grep/perl 如何使用grep / sed提取子字符串和数字 - how to extract substring and numbers only using grep/sed 如何使用 sed,awk 或 grep 在 ZE206A54E97690CCE5ZCC872 上从文件中获取值? - how to get values from file using sed,awk or grep on linux command/scripting? 使用sed,awk,cat或grep将url从xml管道传输到Linux中的单独文件 - Pipe urls from xml into seperate file in Linux, using sed, awk, cat or grep 使用awk / sed从一行中提取数字 - Extract number from a line with awk/sed
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM