I wrote the following code to rewrite a text file in a given order. This order is specified in gA
. gA
is a list: [[fN0,value0],[fN1,value1] ...]
. I sorted this list by value and want to write out respecting this order.
My code works fine, but is very slow on my input (I have an input with 50m rows and it would take 2 months to process it). Therefore, I am looking for ways to fasten this code. Any idea is welcome.
for k in gA:
fN = k[0]
for lineNum, line in enumerate(slicedFile,start=0):
num, restOfLine = line.split('\t',1)
if num == fN:
out.write(line)
inp.seek(0)
You should read the whole file into memory and put all lines in a dict
of num
pointing at a list
of line
s having that num
in the beginning. Then you can iterate once through the gA
and print all lines from that dict
:
from collections import defaultdict
lines = defaultdict(list)
for line in slicedFile:
num, restOfLine = line.split('\t', 1)
lines[num].append(line)
for fN, dummy in gA:
for line in lines[fN]:
out.write(line)
Note: I'm using defaultdict
just to shorten the code. If a non-existing element is used in such a defaultdict
, it gets created automatically (in this case a list
), so I can just call .append()
on the element.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.