[英]Bcbio-gff File creation issue
When creating a file using GFF.write(), i get a new line with "annotation remark" as a source, followed by ASCII encoding of sequence regions:使用 GFF.write() 创建文件时,我得到一个以“annotation remark”为源的新行,然后是序列区域的 ASCII 编码:
##gff-version 3
##sequence-region NC_011594.1 1 16779
NC_011594.1 annotation remark 1 16779 . . . gff-version=3;sequence-region=%28%27NC_011594.1%27%2C 0%2C 16971%29,%28%27NC_042493.1%27%2C 0%2C 132544852%29, (continues on and on)
NC_011594.1 RefSeq gene 1 1531 . + . Dbxref=GeneID:7055888;ID=gene-COX1;Name=COX1;gbkey=Gene;gene=COX1;gene_biotype=protein_coding
Any idea why it's here, what it's for and how i could avoid it?知道为什么它在这里,它的用途以及我如何避免它? I fear it might become a problem when using it in third-party softwares.
我担心在第三方软件中使用它可能会成为问题。
I imported only the bcbio-gff package, but I believe it's part of Biopython, link: https://biopython.org/wiki/GFF_Parsing我只导入了 bcbio-gff package,但我相信它是 Biopython 的一部分,链接: https://biopython.org/wiki/GFF_Parsing
To your first question - "Why it is there?"对于您的第一个问题 - “为什么它在那里?”
To your next question - "How can I avoid it?"对于你的下一个问题——“我怎样才能避免它?”
annotations
attribute to empty dictionary before calling the GFF.write()
. GFF.write()
之前将annotations
属性设置为空字典。 Example:例子:
from Bio import SeqIO
from BCBio import GFF
g = SeqIO.read('NC_003888.3.gb','gb')
g.annotations = {}
with open('t2.gff', 'w') as f:
GFF.write([g], f)
Output file head - no # annotation remark
Output 文件头-无
# annotation remark
head t2.gff
##gff-version 3
##sequence-region NC_003888.3 1 8667507
NC_003888.3 feature source 1 8667507 ... removed for clarity ....
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.