简体   繁体   中英

how to transform csv files of various encoding into utf-8

I have 95 small CSV files downloaded from web. Their schemas are supposed to be very similar. I am trying to concatenate them with Python pandas, but when calling pd.read_csv , the various encoding of those files are causing problems, and I am not sure what's the best way to transform them into consistent encoding, eg utf-8. The encodings include

ASCII text, with CRLF line terminators
Little-endian UTF-16 Unicode English text, with CRLF line terminators
Little-endian UTF-16 Unicode text, with CRLF line terminators
Little-endian UTF-16 Unicode text, with CRLF, CR line terminators
UTF-8 Unicode (with BOM) English text, with CRLF line terminators
UTF-8 Unicode (with BOM) text, with CRLF line terminators

The above list is generated with

file -b *.csv | sort | uniq

Have you tried writing:

import pandas as pd
df=pd.read_csv(file,encoding='utf-8')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM