简体   繁体   中英

'utf-8' codec can't decode byte 0xd5 in position 2912: invalid continuation byte Error when reading csv file in Python

I am cycling through the rows of a csv file, but come across this error when looping through the rows: 'utf-8' codec can't decode byte 0xd5 in position 2912: invalid continuation byte

I'm just trying to get the row count for the file with this function:

def count_lines(filename):
    row_stored = ""
    try:
        with open(filename) as csvfile:
            data_reader = csv.reader(csvfile)
            next(data_reader)
            count = 0
            for index, row in enumerate(data_reader):
                if index == 1220119:
                    print(row)
                row_stored = row
                count += 1
            return count
    except Exception as e:
        print(f'There was a problem with your request: {e}\n', row_stored)
        return False

The row above the erroring row looks like this:

['817949019495', 'QMMZN1300568', '4/28/2017', 'Digital Revenue', 'Track', 'Download Europe', 'GB', 'Amazon International - UK', '', '2', '1.2126506333579932', '109926407', '2/28/2017']

And the row that throws the error looks like this:

['817949019495', 'QMMZN1300568', '4/28/2017', 'Digital Revenue', 'Track', 'Download Europe', 'GB', 'Amazon International - UK', '', '2', '1.2126506333579932', '109926407', '2/28/2017']

I don't see any differences in the two. Is there something with the formatting of this particular row that I'm not seeing?

Note: This csv file is 3.17 GB. Don't know if that's a contributing factor

更改编码解决了这个问题

with open(filename, encoding="ISO-8859-1") as csvfile:

Here's evidence that it the encoding worked. Still not sure what caused the error, but the data remained unaltered. MYSQL 截图

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM