简体   繁体   English

从 fasta 文件中挤出 Acc(基因 ID 或登录号)

[英]extrude Acc(Gene ID or accession number) from a fasta file

".gb\|(. )\|. ","\1" in the function gsub means? ".gb\|(. )\|. ","\1"在 function gsub是什么意思?

enter image description here在此处输入图像描述

If you have a single FASTA sequence in the file you can solve the problem by reading the first line of the file and then split it by the pipe character |如果文件中有单个 FASTA 序列,则可以通过读取文件的第一行然后将其拆分为 pipe 字符来解决问题| . .

If you have multiple sequences then you can read the first character for each line and look for the > character.如果您有多个序列,那么您可以读取每行的第一个字符并查找>字符。

Here is a code example in Python.这是 Python 中的代码示例。 If you need another ID then you can change the index.如果您需要另一个 ID,则可以更改索引。

with open('AE004437.faa') as fh:
    header_line = fh.readline()
    ids = header_line.split('|')
    gene_ids = ids[3]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM