简体   繁体   English

我在使用 PyCharm 上的代码时遇到问题。 我正在尝试读取 csv 文件,但出现 unicode 错误,它无法读取位置中的特定字节

[英]I'm having trouble with code on PyCharm. I'm trying to read a csv file but I'm getting a unicode error where it can't read specific bytes in positions

My code looks like this: I am using PyCharm as my IDE and the csv file I'm using is from MS Excess.我的代码如下所示:我使用 PyCharm 作为我的 IDE,我使用的 csv 文件来自 MS Excess。 I've encoded the csv as UTF-8.我已将 csv 编码为 UTF-8。 I am trying to read the file using pandas.我正在尝试使用 pandas 读取文件。 I want to be able to distinquish between objects and ints when I call df.info() This is also why I didn't change it to 'latin-1' or 'ISO...'当我调用 df.info() 时,我希望能够区分对象和整数这也是我没有将其更改为“latin-1”或“ISO ...”的原因

import pandas as pd  
import numpy as np  
import matplotlib.pyplot as plt  
plt.style.use('fivethirtyeight')  
cols = ['sentiment','id','date','query_string','user','text']  
df = pd.read_csv("trainingandtestdata\\training.1600000.processed.noemoticon.csv", header=None, 
names=cols, encoding='utf-8')#low_memory=False dtype='unicode' encoding='latin1'  
df.head()  
df.info()  
df.sentiment.value_counts()

My error looks like this:我的错误如下所示:
How do I fix the can't decode bytes in position xxxx to xxxx?如何修复 position xxxx 到 xxxx 中无法解码的字节?

"C:\Users\dashg\PycharmProjects\Twitter Sentiment\venv\Scripts\python.exe" 
"C:/Users/dashg/PycharmProjects/Twitter Sentiment/Reviewer.py"   
Traceback (most recent call last):   
  File "C:/Users/dashg/PycharmProjects/Twitter Sentiment/Reviewer.py", line 6, in <module>  
    df = pd.read_csv("trainingandtestdata\\training.1600000.processed.noemoticon.csv", header=None,  
names=cols, encoding='utf-8')#low_memory=False dtype='unicode' encoding='latin1'  
  File "C:\Users\dashg\PycharmProjects\Twitter Sentiment\venv\lib\site- 
packages\pandas\io\parsers.py",       line 676, in parser_f      
    return _read(filepath_or_buffer, kwds)    
  File "C:\Users\dashg\PycharmProjects\Twitter Sentiment\venv\lib\site- 
packages\pandas\io\parsers.py",       line 454, in _read   
    data = parser.read(nrows)   
  File "C:\Users\dashg\PycharmProjects\Twitter Sentiment\venv\lib\site- 
packages\pandas\io\parsers.py",  
line 1133, in read  
    ret = self._engine.read(nrows)  
  File "C:\Users\dashg\PycharmProjects\Twitter Sentiment\venv\lib\site- 
packages\pandas\io\parsers.py",   line 2037, in read  
    data = self._reader.read(nrows)  
  File "pandas\_libs\parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read  
  File "pandas\_libs\parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory  
  File "pandas\_libs\parsers.pyx", line 929, in pandas._libs.parsers.TextReader._read_rows  
  File "pandas\_libs\parsers.pyx", line 916, in pandas._libs.parsers.TextReader._tokenize_rows  
  File "pandas\_libs\parsers.pyx", line 2063, in pandas._libs.parsers.raise_parser_error      
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 51845-51846: invalid continuation 
byte

Process finished with exit code 1

your file doesn't have utf-8 encoding format while your using encoding='utf-8' in read_csv method.当您在read_csv方法中使用encoding='utf-8'时,您的文件没有utf-8编码格式。 use other encoding method to help you solve the problem, like 'latin' or 'ISO-8859-1' .使用其他编码方法来帮助您解决问题,例如'latin''ISO-8859-1' i refer you to this link for help.我向您推荐链接以寻求帮助。

worst case scenario, if none of this works, you can read the file in 'rb' mode ( open(file, 'rb') ) and parse it yourself by splitting each line of data using csv delimiter!最坏的情况,如果这些都不起作用,您可以在'rb'模式下读取文件( open(file, 'rb') )并通过使用 csv 分隔符拆分每一行数据来自己解析它!

I was having the same problem, but in my case the solution was really easy.我遇到了同样的问题,但就我而言,解决方案非常简单。 My ide is PyCharm 2020.1 and the.csv have the iso-8859-1 encoding, I've tried everything without luck, so I decided to check my ide config. My ide is PyCharm 2020.1 and the.csv have the iso-8859-1 encoding, I've tried everything without luck, so I decided to check my ide config. I went to:我去了:

  1. File文件
  2. Settings设置
  3. Left column: Editor左栏:编辑
  4. In Editor: File encoding Then I add my.csv file with the botton: + which is in the right side, and finally change ide's config.在编辑器中:文件编码然后我添加 my.csv 文件和右侧的按钮:+,最后更改 ide 的配置。 Change it all to iso, because by default was in utf-8 and use the exact character to work with the file, in my case is: ?.将其全部更改为 iso,因为默认情况下位于 utf-8 并使用确切的字符来处理文件,在我的情况下是:?。 Hope this work希望这个作品

Its better to save that csv into xlsx and read as最好将 csv 保存到 xlsx 中并读取为

pd.read_excel pd.read_excel

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我在尝试使用CSV文件和列格式将数字(0-10)读入Python 3的列表时遇到麻烦。 - I'm having trouble trying to read the numbers (0 - 10) into a list in Python 3 using CSV file and in a column format. 我无法在 JupyterNotebook 或 pycharm 中导入 yfinance 和 pandas。 (Mac M1) - I can't import yfinance and pandas in JupyterNotebook or pycharm. (Mac M1) 我正在尝试使用 CSV 文件中的数据对饼图进行 plot 饼图,但出现错误,我不明白 - I'm trying to plot a pie chart using data from a CSV file but I'm getting an error I don't understand 我有这个问题 - I'm having trouble with this 我在理解此代码时遇到麻烦 - I'm having trouble understanding this code 我正在尝试使用 Python 读取域名的 IP 地址,但我遇到了奇怪的错误,我不太明白 - I'm trying to read the IP address for a domain name with Python and I'm getting strange errors I don't quite understand 我在使用Python打印文件时遇到问题 - I'm having trouble printing a file in Python 我无法打开Python文件:( - I'm having trouble opening a Python file :( 我在使用 pygame 时遇到问题 - I'm having trouble with pygame 我正在 Pycharm 中安装 requirements.txt 文件,但我收到此错误该怎么办 - I'm installing requirements.txt file in Pycharm but i'm getting this error what to do
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM