I have 95 small CSV files downloaded from web. Their schemas are supposed to be very similar. I am trying to concatenate them with Python pandas, but when calling pd.read_csv
, the various encoding of those files are causing problems, and I am not sure what's the best way to transform them into consistent encoding, eg utf-8. The encodings include
ASCII text, with CRLF line terminators
Little-endian UTF-16 Unicode English text, with CRLF line terminators
Little-endian UTF-16 Unicode text, with CRLF line terminators
Little-endian UTF-16 Unicode text, with CRLF, CR line terminators
UTF-8 Unicode (with BOM) English text, with CRLF line terminators
UTF-8 Unicode (with BOM) text, with CRLF line terminators
The above list is generated with
file -b *.csv | sort | uniq
Have you tried writing:
import pandas as pd
df=pd.read_csv(file,encoding='utf-8')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.