从 fasta 文件中挤出 Acc（基因 ID 或登录号）

Question

".gb\|(. )\|. ","\1" in the function gsub means? ".gb\|(. )\|. ","\1"在 function gsub是什么意思？

enter image description here在此处输入图像描述

Answer 1

If you have a single FASTA sequence in the file you can solve the problem by reading the first line of the file and then split it by the pipe character |如果文件中有单个 FASTA 序列，则可以通过读取文件的第一行然后将其拆分为 pipe 字符来解决问题| . .

If you have multiple sequences then you can read the first character for each line and look for the > character.如果您有多个序列，那么您可以读取每行的第一个字符并查找>字符。

Here is a code example in Python.这是 Python 中的代码示例。 If you need another ID then you can change the index.如果您需要另一个 ID，则可以更改索引。

with open('AE004437.faa') as fh:
    header_line = fh.readline()
    ids = header_line.split('|')
    gene_ids = ids[3]

从 fasta 文件中挤出 Acc（基因 ID 或登录号）

问题描述

1 个解决方案

解决方案1
0 2022-01-23 13:01:56

从 fasta 文件中挤出 Acc（基因 ID 或登录号）

问题描述

1 个解决方案

解决方案1 0 2022-01-23 13:01:56

解决方案1
0 2022-01-23 13:01:56