how to transform csv files of various encoding into utf-8

Question

I have 95 small CSV files downloaded from web. Their schemas are supposed to be very similar. I am trying to concatenate them with Python pandas, but when calling pd.read_csv , the various encoding of those files are causing problems, and I am not sure what's the best way to transform them into consistent encoding, eg utf-8. The encodings include

ASCII text, with CRLF line terminators
Little-endian UTF-16 Unicode English text, with CRLF line terminators
Little-endian UTF-16 Unicode text, with CRLF line terminators
Little-endian UTF-16 Unicode text, with CRLF, CR line terminators
UTF-8 Unicode (with BOM) English text, with CRLF line terminators
UTF-8 Unicode (with BOM) text, with CRLF line terminators

The above list is generated with

file -b *.csv | sort | uniq

Answer 1

Have you tried writing:

import pandas as pd
df=pd.read_csv(file,encoding='utf-8')

how to transform csv files of various encoding into utf-8

Question

1 answers

solution1
0 2017-11-26 10:20:18

how to transform csv files of various encoding into utf-8

Question

1 answers

solution1 0 2017-11-26 10:20:18

solution1
0 2017-11-26 10:20:18