简体   繁体   English

Python - 读取 CSV UnicodeError

[英]Python - Reading CSV UnicodeError

I have exported a CSV from Kaggle - https://www.kaggle.com/ngyptr/python-nltk-sentiment-analysis .我已经从 Kaggle 导出了一个 CSV - https://www.kaggle.com/ngyptr/python-nltk-sentiment-analysis However, when I attempt to iterate through the file, I receive unicode errors concerning certain characters that cannot be encoded.但是,当我尝试遍历文件时,我收到有关某些无法编码的字符的 unicode 错误。

File "C:\Program Files\Python35\lib\encodings\cp850.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]

UnicodeEncodeError: 'charmap' codec can't encode character '\…' in position 264: character maps to UnicodeEncodeError: 'charmap' codec can't encode character '\…' in position 264: character maps to

I have enabled utf-8 encoding while opening the file, which I assumed would have decoded the ASCII characters.我在打开文件时启用了 utf-8 编码,我认为它会解码 ASCII 字符。 Evidently not.显然不是。

My Code:我的代码:

with open("sentimentDataSet.csv", "r", encoding="utf-8" ,errors='ignore', newline='') as file:
    reader = csv.reader(file)-
    for row in reader:
        if row:
            print(row)
            if row[sentimentCsvColumn] == sentimentScores(row[textCsvColumn]):
                accuracyCount += 1
    print(accuracyCount)

That's an encode error as you're printing the row, and has little to do with reading the actual CSV.这是打印行时的编码错误,与读取实际 CSV 无关。

Your Windows terminal is in CP850 encoding, which can't represent everything.你的Windows终端是CP850编码的,不能代表一切。

There are some things you can do here.您可以在这里做一些事情。

  • A simple way is to set the PYTHONIOENCODING environment variable to a combination that will trash things it can't represent.一个简单的方法是将PYTHONIOENCODING环境变量设置为一个组合,这将破坏它不能代表的东西。 set PYTHONIOENCODING=cp850:replace before running Python will have Python replace characters unrepresentable in CP850 with question marks.在运行 Python 之前set PYTHONIOENCODING=cp850:replace将使 Python 用问号替换 CP850 中无法表示的字符。
  • Change your terminal encoding to UTF-8: chcp 65001 before running Python.在运行 Python 之前将终端编码更改为 UTF-8: chcp 65001
  • Encode the thing by hand before printing: print(str(data).encode('ascii', 'replace')) print(str(data).encode('ascii', 'replace'))前手动编码: print(str(data).encode('ascii', 'replace'))
  • Don't print the thing.不要打印东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM