简体   繁体   中英

pandas reading csv file encoding error

i have a iso8859-9 encoded csv file and trying to read it into a dataframe. here is the code and error I got.

iller = pd.read_csv('/Users/me/Documents/Works/map/dist.csv' ,sep=';',encoding='iso-8859-9')
iller.head()

and error is

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 250: ordinal not in range(128)

and code below works without error.

import codecs
myfile = codecs.open('/Users/me/Documents/Works/map/dist.csv', "r",encoding='iso-8859-9')
for a in myfile:
    print a 

My question is why pandas not reading my correctly encoded file ? and is there any way to make it read?

Not possible to see what could be off with you data of course, but if you can read in the data without issues with codecs , then maybe an idea would be to write out the file to UTF encoding(?)

import codecs
filename = '/Users/me/Documents/Works/map/dist.csv'
target_filename = '/Users/me/Documents/Works/map/dist-utf-8.csv'
myfile = codecs.open(filename, "r",encoding='iso-8859-9')
f_contents = myfile.read()

or

import codecs
with codecs.open(filename, 'r', encoding='iso-8859-9') as fh:
  f_contents = fh.read()

# write out in UTF-8
with codecs.open(target_filename, 'w', encoding = 'utf-8') as fh:
  fh.write(f_contents)

I hope this helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM