简体   繁体   English

Python没有读/写整个csv文件

[英]Python not reading/writing entire csv file

I'm working on a project where I have to parse a huge csv file with 500,000 rows.我正在做一个项目,我必须解析一个包含 500,000 行的巨大 csv 文件。 Below is a small portion of code as an example.下面以一小部分代码为例。 It breaks up the columns fine, but it only reads 9,132 rows when I need it to go through all 500,000.它很好地分解了列,但是当我需要它遍历所有 500,000 行时,它只读取 9,132 行。 The csv is encoded in cp1252, which I have a feeling might be part of the issue but I am not sure. csv 以 cp1252 编码,我觉得这可能是问题的一部分,但我不确定。 Also here is the error I am getting:这也是我得到的错误:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 4123: character maps to <undefined>

Code:代码:

import csv

outfile = open("newFile.csv", 'w')
with open("ProductFile.csv", "r") as f:
    reader = csv.reader(f, delimiter=',')
    for row in reader:
        Item_ID = row[0]
        Sku = row[1]
        SKU_ID = row[2]
        altpartnum = row[3]
        Application = row[4]
        Brandcode = row[5]

        line = "{},{},{},{},{},{},\n".format(
            Item_ID, AD_SKU_ID, MemberSku, Application, Brandcode, Application, Brandcode)
        outfile.write(line)
    outfile.close()

CP1252 doesn't support decoding byte 0x81, so the encoding is not CP1252. CP1252 不支持解码字节 0x81,所以编码不是 CP1252。 It might be ISO-88591 (aka latin1) but it will encoded all bytes to something so you may get mojibake :它可能是 ISO-88591(又名 latin1),但它会将所有字节编码为某种东西,因此您可能会得到mojibake

Suggested code (but use the correct encoding if not latin1 ):建议的代码(但如果不是latin1则使用正确的编码):

import csv

with (open('ProductFile.csv', 'r', encoding='latin1', newline='') as fin,
      open('newFile.csv', 'w', encoding='latin1', newline='') as fout):

    reader = csv.reader(fin)
    writer = csv.writer(fout)

    for row in reader:
        fout.writerow(row[:6]) # first 6 columns or whatever you want to write
                               # The OP code had undefined variables

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM