简体   繁体   English

从BIG CSV文件Python中删除一行

[英]Delete a Line from BIG CSV file Python

I have an 11GB CSV file which has some corrupt lines I have to delete, I have identified the corrupted lines numbers from an ETL interface. 我有一个11GB的CSV文件,其中有一些必须删除的损坏行,我已经从ETL界面中识别出损坏的行号。

My program runs with small datasets, however, when I want to run on the main file I'm getting MemoryError. 我的程序使用小的数据集运行,但是,当我想在主文件上运行时,我遇到了MemoryError。 Below the code I'm using Do you have any suggestion to make it work? 在我正在使用的代码下面,您是否有任何建议使其正常工作?

row_to_delete = 101068
filename = "EKBE_0_20180907_065907 - Copy.csv"
with open(filename, 'r', encoding='utf8' ,errors='ignore') as file:
    data = file.readlines()
    print(data[row_to_delete -1 ])
    data [row_to_delete -1] = ''
with open(filename, 'wb',encoding="utf8",errors='ignore') as file:
    file.writelines( data )

Error: 错误:

Traceback (most recent call last):
  File "/.PyCharmCE2018.2/config/scratches/scratch_7.py", line 7, in <module>
    data = file.readlines()
MemoryError

Rather than read the whole list into memory, loop over the input file , and write all lines except the line you need to delete to the a new file. 而不是将整个列表读入内存,而是遍历输入文件 ,然后将需要删除的行以外的所有行都写入新文件。 Use enumerate() to keep a counter if you need to delete by index: 如果需要按索引删除,请使用enumerate()保持计数器:

row_to_delete = 101068
filename = "EKBE_0_20180907_065907 - Copy.csv"
with open(filename, 'r', encoding='utf8', errors='ignore') as inputfile,\
     open(filename + '.fixed', 'wb', encoding="utf8") as outputfile:
    for index, line in enumerate(inputfile):
        if index == row_to_delete:
            continue  # don't write the line that matches
        outputfile.writeline(line)

Rather than use an index, you could even detect a bad line directly in code this way. 您甚至可以直接在代码中检测到不良行,而不是使用索引。

Note that this writes to a new file, with the same name but with .fixed added. 请注意,这将写入一个具有相同名称但添加了.fixed的新文件。

You can move that file back to replace the old file if you want to, with os.rename() , once you are done copying all but the bad line: 复制完不良行以外的所有内容后,可以使用os.rename()将文件移回以替换旧文件。

os.rename(filename + '.fixed', filename)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM