简体   繁体   中英

Script stopping during execution

I'm doing a check for data migration failures in my database, and while my Python script works fine for a smaller amount of data, it currently is stopping in the middle of execution. The cmd is still in execution state but doesn't appear to be running at some point, and I need to manually abort it using Ctrl+C.

Code and comments below:

import collections
import csv

a=[]

with open('FailedIds.txt') as my_file:
    for line in my_file:
        a.append(line) #builds array of unique row IDs that failed in migration. Contains 680k rows.

with open("OldDbAll.txt", 'r') as f:
    l = list(csv.reader(f))
    dict = {i[0]:[(x) for x in i[1:]] for i in zip(*l)} #builds dictionary containing all rows and columns from our old DB, key = column header, values = arrays of values. Contains 3 million rows and 9 columns, 200MB in file size.

string=''
print("Done building dictionary")

with open('Fix.txt', 'w') as f:
  print(",".join(dict.keys()),file=f)
  for i in range(len(dict['UNIQUEID'])):
    for j in range(len(a)):
      if a[j].strip()==dict['UNIQUEID'][i]: #matching failure row ID to the dictionary unique ID array
        for key in dict:
          string+=dict[key][i]+"," #prints the data to be re-migrated
        print(string,file=f)
        string=''

When I first ran this script overnight, I got around 50k rows after manually aborting the python script. I thought that was OK because my computer might have hibernated. However, this morning I got 1k rows after running the script throughout yesterday and into the night. I plan to restart my computer and set it to not sleep the next time, but I would like to get all 600k+ rows as the output, and currently I'm nowhere near that amount.

I searched around and Python's array size limit should be well above what I'm using it for, so something else is causing the program to hang. Any thoughts would be appreciated!

I believe this loop is the reason your code takes so long to run:

for key in dict:
  string+=dict[key][i]+"," #prints the data to be re-migrated
print(string,file=f)
string=''

String concatenation is slow , and this loop does a lot of it.

I don't think you need to concatenate at all -- just write to the file as you go:

for key in dict:
  f.write(dict[key][i]+",")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM