简体   繁体   English

在python中转置大制表符分隔的文件

[英]Transpose a large tab delimited file in python

I am trying to transpose a huge tab delimited file with about 6000 rows and 2 million columns. 我试图转置一个巨大的制表符分隔文件,其中包含约6000行和200万列。 The preferable approach should not involving holding the whole file in memory, which seems to be what the answer in this question does: 首选方法不应涉及将整个文件保存在内存中,这似乎是此问题的答案:

How to do row-to-column transposition of data in csv table? 如何在csv表中进行行到列的数据转置?

One approach would be to iterate over the input file once for every column (untested code!): 一种方法是为每一列遍历输入文件一次(未经测试的代码!):

with open("input") as f, open("output", "w") as g:
    try:
        for column_index in itertools.count():
            f.seek(0)
            col = [line.split("\t")[column_index] for line in f]
            g.write("\t".join(col) + "\n")
    except IndexError:
        pass

This is going to be very slow, but only keeps a single line at a time in memory. 这将非常慢,但是一次只能在内存中保留一行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM