简体   繁体   中英

Pandas.read_csv() with special characters (accents) in column names �

I have a csv file that contains some data with columns names:

  • "PERIODE"
  • "IAS_brut"
  • "IAS_lissé"
  • "Incidence_Sentinelles"

I have a problem with the third one "IAS_lissé" which is misinterpreted by pd.read_csv() method and returned as .

What is that character?

Because it's generating a bug in my flask application, is there a way to read that column in an other way without modifying the file?

In [1]: import pandas as pd

In [2]: pd.read_csv("Openhealth_S-Grippal.csv",delimiter=";").columns

Out[2]: Index([u'PERIODE', u'IAS_brut', u'IAS_liss�', u'Incidence_Sentinelles'], dtype='object')

You can change the encoding parameter for read_csv, see the pandas doc here . Also the python standard encodings are here .

I believe for your example you can use the utf-8 encoding (assuming that your language is French).

df = pd.read_csv("Openhealth_S-Grippal.csv", delimiter=";", encoding='utf-8')

Here's an example showing some sample output. All I did was make a csv file with one column, using the problem characters.

df = pd.read_csv('sample.csv', encoding='utf-8')

Output:

    IAS_lissé
0   1
1   2
2   3

I found the same problem with spanish, solved it with with "latin1" encoding:

import pandas as pd

 pd.read_csv("Openhealth_S-Grippal.csv",delimiter=";", encoding='latin1')

Hope it helps!

Using utf-8 didn't work for me. Eg this piece of code:

    bla = pd.DataFrame(data = [1, 2])
    bla.to_csv('funkyNamé , things.csv')
    blabla = pd.read_csv('funkyNamé , things.csv', delimiter=";", encoding='utf-8')
    blabla 

Ultimately returned: OSError: Initializing from file failed

I know you said you didn't want to modify the file. If you meant the file content vs the filename, I would rename the file to something without an accent, read the csv file under its new name, then reset the filename back to its original name.

    originalfilepath = r'C:\Users\myself\\funkyNamé , things.csv'
    originalfolder = r'C:\Users\myself'
    os.rename(originalfilepath, originalFolder+"\\tempName.csv")
    df = pd.read_csv(originalFolder+"\\tempName.csv", encoding='ISO-8859-1')
    os.rename(originalFolder+"\\tempName.csv", originalfilepath)

If you did mean "without modifying the file name , my apologies for not being helpful to you, and I hope this helps someone else.

Try using:

import pandas as pd    
df = pd.read_csv('file_name.csv', encoding='utf-8-sig')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM