I'm doing a check for data migration failures in my database, and while my Python script works fine for a smaller amount of data, it currently is stopping in the middle of execution. The cmd is still in execution state but doesn't appear to be running at some point, and I need to manually abort it using Ctrl+C.
Code and comments below:
import collections
import csv
a=[]
with open('FailedIds.txt') as my_file:
for line in my_file:
a.append(line) #builds array of unique row IDs that failed in migration. Contains 680k rows.
with open("OldDbAll.txt", 'r') as f:
l = list(csv.reader(f))
dict = {i[0]:[(x) for x in i[1:]] for i in zip(*l)} #builds dictionary containing all rows and columns from our old DB, key = column header, values = arrays of values. Contains 3 million rows and 9 columns, 200MB in file size.
string=''
print("Done building dictionary")
with open('Fix.txt', 'w') as f:
print(",".join(dict.keys()),file=f)
for i in range(len(dict['UNIQUEID'])):
for j in range(len(a)):
if a[j].strip()==dict['UNIQUEID'][i]: #matching failure row ID to the dictionary unique ID array
for key in dict:
string+=dict[key][i]+"," #prints the data to be re-migrated
print(string,file=f)
string=''
When I first ran this script overnight, I got around 50k rows after manually aborting the python script. I thought that was OK because my computer might have hibernated. However, this morning I got 1k rows after running the script throughout yesterday and into the night. I plan to restart my computer and set it to not sleep the next time, but I would like to get all 600k+ rows as the output, and currently I'm nowhere near that amount.
I searched around and Python's array size limit should be well above what I'm using it for, so something else is causing the program to hang. Any thoughts would be appreciated!
I believe this loop is the reason your code takes so long to run:
for key in dict:
string+=dict[key][i]+"," #prints the data to be re-migrated
print(string,file=f)
string=''
String concatenation is slow , and this loop does a lot of it.
I don't think you need to concatenate at all -- just write to the file as you go:
for key in dict:
f.write(dict[key][i]+",")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.