简体   繁体   中英

Fast rearranging of a text file

I wrote the following code to rewrite a text file in a given order. This order is specified in gA . gA is a list: [[fN0,value0],[fN1,value1] ...] . I sorted this list by value and want to write out respecting this order.

My code works fine, but is very slow on my input (I have an input with 50m rows and it would take 2 months to process it). Therefore, I am looking for ways to fasten this code. Any idea is welcome.

for k in gA:
    fN = k[0]
    for lineNum, line in enumerate(slicedFile,start=0):
        num, restOfLine = line.split('\t',1)
        if num == fN:
            out.write(line)
    inp.seek(0)

You should read the whole file into memory and put all lines in a dict of num pointing at a list of line s having that num in the beginning. Then you can iterate once through the gA and print all lines from that dict :

from collections import defaultdict

lines = defaultdict(list)
for line in slicedFile:
  num, restOfLine = line.split('\t', 1)
  lines[num].append(line)

for fN, dummy in gA:
  for line in lines[fN]:
    out.write(line)

Note: I'm using defaultdict just to shorten the code. If a non-existing element is used in such a defaultdict , it gets created automatically (in this case a list ), so I can just call .append() on the element.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM