Script stopping during execution

Question

I'm doing a check for data migration failures in my database, and while my Python script works fine for a smaller amount of data, it currently is stopping in the middle of execution. The cmd is still in execution state but doesn't appear to be running at some point, and I need to manually abort it using Ctrl+C.

Code and comments below:

import collections
import csv

a=[]

with open('FailedIds.txt') as my_file:
    for line in my_file:
        a.append(line) #builds array of unique row IDs that failed in migration. Contains 680k rows.

with open("OldDbAll.txt", 'r') as f:
    l = list(csv.reader(f))
    dict = {i[0]:[(x) for x in i[1:]] for i in zip(*l)} #builds dictionary containing all rows and columns from our old DB, key = column header, values = arrays of values. Contains 3 million rows and 9 columns, 200MB in file size.

string=''
print("Done building dictionary")

with open('Fix.txt', 'w') as f:
  print(",".join(dict.keys()),file=f)
  for i in range(len(dict['UNIQUEID'])):
    for j in range(len(a)):
      if a[j].strip()==dict['UNIQUEID'][i]: #matching failure row ID to the dictionary unique ID array
        for key in dict:
          string+=dict[key][i]+"," #prints the data to be re-migrated
        print(string,file=f)
        string=''

When I first ran this script overnight, I got around 50k rows after manually aborting the python script. I thought that was OK because my computer might have hibernated. However, this morning I got 1k rows after running the script throughout yesterday and into the night. I plan to restart my computer and set it to not sleep the next time, but I would like to get all 600k+ rows as the output, and currently I'm nowhere near that amount.

I searched around and Python's array size limit should be well above what I'm using it for, so something else is causing the program to hang. Any thoughts would be appreciated!

Answer 1

I believe this loop is the reason your code takes so long to run:

for key in dict:
  string+=dict[key][i]+"," #prints the data to be re-migrated
print(string,file=f)
string=''

String concatenation is slow , and this loop does a lot of it.

I don't think you need to concatenate at all -- just write to the file as you go:

for key in dict:
  f.write(dict[key][i]+",")

Script stopping during execution

Question

1 answers

solution1
1 ACCPTED 2020-06-16 15:33:17

Script stopping during execution

Question

1 answers

solution1 1 ACCPTED 2020-06-16 15:33:17

solution1
1 ACCPTED 2020-06-16 15:33:17