逐行读取gzip压缩文本文件，以便在python 3.2.6中进行处理

Question

I'm a complete newbie when it comes to python, but I've been tasked with trying to get a piece of code running on a machine which has a different version of python (3.2.6) than that which the code was originally built for. 对于python来说，我是一个完全新手，但我的任务是尝试在一台机器上运行一段代码，该机器具有与最初构建代码不同版本的python（3.2.6）对于。

I've come across an issue with reading in a gzipped-text file line-by-line (and processing it depending on the first character). 我遇到了一个逐行阅读gzip文本文件的问题（并根据第一个字符处理它）。 The code (which obviously is written in python > 3.2.6) is 代码（显然是用python> 3.2.6编写的）是

for line in gzip.open(input[0], 'rt'):
    if line[:1] != '>':
        out.write(line)
        continue

    chromname = match2chrom(line[1:-1])
    seqname = line[1:].split()[0]

    print('>{}'.format(chromname), file=out)
    print('{}\t{}'.format(seqname, chromname), file=mappingout)

(for those who know, this strips gzipped FASTA genome files into headers (with ">" at start) and sequences, and processes the lines into two different files depending on this) （对于那些知道，这条带将FASTA基因组文件压缩成标题（在开始时带有“>”）和序列，并根据此处理行分为两个不同的文件）

I have found https://bugs.python.org/issue13989 , which states that mode 'rt' cannot be used for gzip.open in python-3.2 and to use something along the lines of: 我找到了https://bugs.python.org/issue13989 ，它声明模式'rt'不能用于python-3.2中的gzip.open并使用以下内容：

import io

with io.TextIOWrapper(gzip.open(input[0], "r")) as fin:
     for line in fin:
         if line[:1] != '>':
             out.write(line)
             continue

         chromname = match2chrom(line[1:-1])
         seqname = line[1:].split()[0]

         print('>{}'.format(chromname), file=out)
         print('{}\t{}'.format(seqname, chromname), file=mappingout)

but the above code does not work: 但上面的代码不起作用：

UnsupportedOperation in line <4> of /path/to/python_file.py:
read1

How can I rewrite this routine to give out exactly what I want - reading the gzip file line-by-line into the variable "line" and processing based on the first character? 我怎样才能重写这个例程来准确地给出我想要的东西 - 将gzip文件逐行读入变量“line”并根据第一个字符进行处理？

EDIT: traceback from the first version of this routine is (python 3.2.6): 编辑：从这个例程的第一个版本回溯是（python 3.2.6）：

Mode rt not supported  
File "/path/to/python_file.py", line 79, in __process_genome_sequences  
File "/opt/python-3.2.6/lib/python3.2/gzip.py", line 46, in open  
File "/opt/python-3.2.6/lib/python3.2/gzip.py", line 157, in __init__

Traceback from the second version is: 第二个版本的回溯是：

UnsupportedOperation in line 81 of /path/to/python_file.py:
read1
File "/path/to/python_file.py", line 81, in __process_genome_sequences

with no further traceback (the extra two lines in the line count are the import io and with io.TextIOWrapper(gzip.open(input[0], "r")) as fin: lines 没有进一步的追溯（行数中的额外两行是import io和with io.TextIOWrapper(gzip.open(input[0], "r")) as fin: lines

Answer 1

I have actually appeared to have solved the problem. 我实际上似乎已经解决了这个问题。

In the end I had to use shell("gunzip {input[0]}") to ensure that the gunzipped file could be read in in text mode, and then read in the resulting file using 最后我不得不使用shell("gunzip {input[0]}")来确保可以在文本模式下读入gunzipped文件，然后使用shell("gunzip {input[0]}")读取结果文件

for line in open(' *< resulting file >* ','r'):
    if line[:1] != '>':
        out.write(line)
        continue

    chromname = match2chrom(line[1:-1])
    seqname = line[1:].split()[0]

    print('>{}'.format(chromname), file=out)
    print('{}\t{}'.format(seqname, chromname), file=mappingout)

逐行读取gzip压缩文本文件，以便在python 3.2.6中进行处理

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-10-25 23:46:17

逐行读取gzip压缩文本文件，以便在python 3.2.6中进行处理

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-10-25 23:46:17

解决方案1
0 已采纳 2015-10-25 23:46:17