简体   繁体   English

如何使用Linux命令在Fasta文件中提取标头的一部分

[英]how to extract a part of header in Fasta file by using Linux command

I have a Fasta file with unique header,I would like to extract a part of this header by using Regular expression in Unix. 我有一个带有唯一标头的Fasta文件,我想通过在Unix中使用正则表达式来提取此标头的一部分。

for example My Fasta file start with this header: 例如,我的Fasta文件以以下标头开头:

>jgi|Penbr2|47586|fgenesh1_pm.1_#_25  

and I would like to extract just the last part of this header like: 我只想提取此标头的最后一部分,例如:

>fgenesh1_pm.1_#_25

Actually I use this regular expression in vim editor but It did not work: 实际上,我在vim编辑器中使用了此正则表达式,但是它不起作用:

:%s/^([^|]+\|){3}//g

or 要么

:%s/^([A-Z][0-9]+\|){3}//g

I would be appropriate if give me some suggestion. 请给我一些建议。

You can use sed : 您可以使用sed

sed -e 's/>.*|/>/' fasta-file

ie everything between > and | >|之间的所有内容 is replaced by > . >替换。

I don't know if the leading > is also a part of your text. 我不知道开头的>是否也是您文本的一部分。 Assume that they are not. 假设它们不是。

Since you tagged with vim , then I just post the vim solution. 既然您用vim标记了,那么我就发布vim解决方案。

You can make the usage of the "greedy" of regex: 您可以使用正则表达式的“贪婪”:

In vim: 在vim中:

%s/.*|//

will leave the last part, this is the easiest way. 将离开最后一部分,这是最简单的方法。

in vim you can use \\zs, \\ze and non-greedy matching too: 在vim中,您也可以使用\\zs, \\zenon-greedy匹配:

%s/\zs.\{-}\ze[^|]\+$//

Of course, if you like grouping, you can use \\(...\\) to group and don't use \\zs \\ze . 当然,如果您喜欢分组,则可以使用\\(...\\)进行分组,而不必使用\\zs \\ze

In your codes, you grouped just with (...) without escaping. 在您的代码中,您仅用(...)进行了分组,没有转义。 I don't know how did you configure your magic setting in your vimrc, if you use default, you have to escape the ( and ) to give them special meanings (grouping here). 我不知道您是如何在vimrc中配置magic设置的,如果使用default,则必须转义( and )以赋予它们特殊的含义(在此处分组)。 Just like what we do with BRE. 就像我们对BRE所做的一样。 Do a :h magic , and find the table to know the difference. :h magic ,找到表以了解不同之处。

In vim do :h terms to get detail information. 在vim中, :h terms可以获取详细信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM