脚本在执行期间停止

Question

I'm doing a check for data migration failures in my database, and while my Python script works fine for a smaller amount of data, it currently is stopping in the middle of execution.我正在检查我的数据库中的数据迁移失败，虽然我的 Python 脚本适用于少量数据，但它目前在执行过程中停止。 The cmd is still in execution state but doesn't appear to be running at some point, and I need to manually abort it using Ctrl+C. cmd 仍在执行 state 但似乎在某些时候没有运行，我需要使用 Ctrl+C 手动中止它。

Code and comments below:下面的代码和注释：

import collections
import csv

a=[]

with open('FailedIds.txt') as my_file:
    for line in my_file:
        a.append(line) #builds array of unique row IDs that failed in migration. Contains 680k rows.

with open("OldDbAll.txt", 'r') as f:
    l = list(csv.reader(f))
    dict = {i[0]:[(x) for x in i[1:]] for i in zip(*l)} #builds dictionary containing all rows and columns from our old DB, key = column header, values = arrays of values. Contains 3 million rows and 9 columns, 200MB in file size.

string=''
print("Done building dictionary")

with open('Fix.txt', 'w') as f:
  print(",".join(dict.keys()),file=f)
  for i in range(len(dict['UNIQUEID'])):
    for j in range(len(a)):
      if a[j].strip()==dict['UNIQUEID'][i]: #matching failure row ID to the dictionary unique ID array
        for key in dict:
          string+=dict[key][i]+"," #prints the data to be re-migrated
        print(string,file=f)
        string=''

When I first ran this script overnight, I got around 50k rows after manually aborting the python script.当我第一次在一夜之间运行这个脚本时，在手动中止 python 脚本后，我得到了大约 50k 行。 I thought that was OK because my computer might have hibernated.我认为这没关系，因为我的电脑可能已经休眠。 However, this morning I got 1k rows after running the script throughout yesterday and into the night.然而，今天早上，在整个昨天到深夜运行脚本后，我得到了 1k 行。 I plan to restart my computer and set it to not sleep the next time, but I would like to get all 600k+ rows as the output, and currently I'm nowhere near that amount.我计划重新启动我的计算机并将其设置为下次不休眠，但我想将所有 600k+ 行作为 output，而目前我离这个数量还很远。

I searched around and Python's array size limit should be well above what I'm using it for, so something else is causing the program to hang.我四处搜索，Python 的数组大小限制应该远高于我使用它的大小，所以其他原因导致程序挂起。 Any thoughts would be appreciated!任何想法将不胜感激！

Answer 1

I believe this loop is the reason your code takes so long to run:我相信这个循环是你的代码需要这么长时间才能运行的原因：

for key in dict:
  string+=dict[key][i]+"," #prints the data to be re-migrated
print(string,file=f)
string=''

String concatenation is slow , and this loop does a lot of it.字符串连接很慢，这个循环做了很多。

I don't think you need to concatenate at all -- just write to the file as you go:我认为您根本不需要连接 - 只需像 go 一样写入文件：

for key in dict:
  f.write(dict[key][i]+",")

脚本在执行期间停止

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-06-16 15:33:17

脚本在执行期间停止

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-06-16 15:33:17

解决方案1
1 已采纳 2020-06-16 15:33:17