There is a CSV format file with three column dataframe. The third column has long text. This error message occurred, when i tried to open the file using pandas.read_csv
message : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 0: invalid start byte.
But there is no problem opening the file with
with open('file.csv', 'r', encoding='utf-8', errors = "ignore") as csvfile:
I don't know how converting this data to dataframe and i don't think pandas.read_csv
handle this error properly.
So, how can i open this file and get dataframe?
Try this:
Open the cvs file in a text editor and make sure to save it in utf-8 format.
Then read the file as normal:
import pandas
csvfile = pandas.read_csv('file.csv', encoding='utf-8')
I would try using the built-in csv reader then put the data into pandas.
import csv
with open('eggs.csv', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in spamreader:
print(', '.join(row))
If this doesn't work, then at least you can confirm that it is a csv issue and not a pandas issue choking on encodings.
The other recommendation is to ensure you are using Python 3.x that handles encoding issues much better than 2.7.
If you can provide your sample, I can test it myself and update my answer accordingly.
You can try another option for encoding as "ISO-8859-1"
In your case:
with open('file.csv', 'r', encoding = 'ISO-8859-1', errors = "ignore") as csvfile:
or try this:
import pandas as pd
data_file = pd.read_csv("file.csv", encoding = "ISO-8859-1")
print(data_file)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.