[英]Reading gzipped text file line-by-line for processing in python 3.2.6
I'm a complete newbie when it comes to python, but I've been tasked with trying to get a piece of code running on a machine which has a different version of python (3.2.6) than that which the code was originally built for. 对于python来说,我是一个完全新手,但我的任务是尝试在一台机器上运行一段代码,该机器具有与最初构建代码不同版本的python(3.2.6)对于。
I've come across an issue with reading in a gzipped-text file line-by-line (and processing it depending on the first character). 我遇到了一个逐行阅读gzip文本文件的问题(并根据第一个字符处理它)。 The code (which obviously is written in python > 3.2.6) is 代码(显然是用python> 3.2.6编写的)是
for line in gzip.open(input[0], 'rt'):
if line[:1] != '>':
out.write(line)
continue
chromname = match2chrom(line[1:-1])
seqname = line[1:].split()[0]
print('>{}'.format(chromname), file=out)
print('{}\t{}'.format(seqname, chromname), file=mappingout)
(for those who know, this strips gzipped FASTA genome files into headers (with ">" at start) and sequences, and processes the lines into two different files depending on this) (对于那些知道,这条带将FASTA基因组文件压缩成标题(在开始时带有“>”)和序列,并根据此处理行分为两个不同的文件)
I have found https://bugs.python.org/issue13989 , which states that mode 'rt' cannot be used for gzip.open in python-3.2 and to use something along the lines of: 我找到了https://bugs.python.org/issue13989 ,它声明模式'rt'不能用于python-3.2中的gzip.open并使用以下内容:
import io
with io.TextIOWrapper(gzip.open(input[0], "r")) as fin:
for line in fin:
if line[:1] != '>':
out.write(line)
continue
chromname = match2chrom(line[1:-1])
seqname = line[1:].split()[0]
print('>{}'.format(chromname), file=out)
print('{}\t{}'.format(seqname, chromname), file=mappingout)
but the above code does not work: 但上面的代码不起作用:
UnsupportedOperation in line <4> of /path/to/python_file.py:
read1
How can I rewrite this routine to give out exactly what I want - reading the gzip file line-by-line into the variable "line" and processing based on the first character? 我怎样才能重写这个例程来准确地给出我想要的东西 - 将gzip文件逐行读入变量“line”并根据第一个字符进行处理?
EDIT: traceback from the first version of this routine is (python 3.2.6): 编辑:从这个例程的第一个版本回溯是(python 3.2.6):
Mode rt not supported
File "/path/to/python_file.py", line 79, in __process_genome_sequences
File "/opt/python-3.2.6/lib/python3.2/gzip.py", line 46, in open
File "/opt/python-3.2.6/lib/python3.2/gzip.py", line 157, in __init__
Traceback from the second version is: 第二个版本的回溯是:
UnsupportedOperation in line 81 of /path/to/python_file.py:
read1
File "/path/to/python_file.py", line 81, in __process_genome_sequences
with no further traceback (the extra two lines in the line count are the import io
and with io.TextIOWrapper(gzip.open(input[0], "r")) as fin:
lines 没有进一步的追溯(行数中的额外两行是import io
和with io.TextIOWrapper(gzip.open(input[0], "r")) as fin:
lines
I have actually appeared to have solved the problem. 我实际上似乎已经解决了这个问题。
In the end I had to use shell("gunzip {input[0]}")
to ensure that the gunzipped file could be read in in text mode, and then read in the resulting file using 最后我不得不使用shell("gunzip {input[0]}")
来确保可以在文本模式下读入gunzipped文件,然后使用shell("gunzip {input[0]}")
读取结果文件
for line in open(' *< resulting file >* ','r'):
if line[:1] != '>':
out.write(line)
continue
chromname = match2chrom(line[1:-1])
seqname = line[1:].split()[0]
print('>{}'.format(chromname), file=out)
print('{}\t{}'.format(seqname, chromname), file=mappingout)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.