I use Python 2.7 and I have created a pandas DataFrame using pd.read_excel(my_path, encoding="utf-8")
named my_reader
. One of its columns is named 'Descrição'.
I have all the columns names in a list named client_list
.
When I'm trying to use my list's data as index for my_reader
I get an error
KeyError: 'Descri\xc3\xa7\xc3\xa3o'
It works fine with all other data which contain only English letters. When I print client_list
I get the names correctly displayed
print client_list[0]
Descrição
But
client_list[0]
'Descri\xc3\xa7\xc3\xa3o'
So I can't use
my_reader[client_list[i]]
Any ideas?
Thanks
Your dataframe is saved with encoding="utf-8"
, when you use the 'Descri\\xc3\\xa7\\xc3\\xa3o'
as the index of the dataframe, better decode it with "utf-8"
, then you can get the data. For example:
import pandas as pd
my_reader = pd.read_excel('comparison.xlsx',encoding="utf-8")
my_reader
my_reader
will be:
Col_1 Col_2 file Descrição
0 Abc Abk cnl DFSDF
1 Nck Nck Abk DSFAF
2 xkl cnl Abc FDAS
3 mzn mzn NaN DFAS
You can use :
my_reader['Descrição'.decode('utf-8')]
This will give you the result:
0 DFSDF
1 DSFAF
2 FDAS
3 DFAS
Name: Descrição, dtype: object
For other column you also can trace with unicode
:
my_reader['Col_2'.decode("utf-8")]
Output:
0 Abk
1 Nck
2 cnl
3 mzn
Name: Col_2, dtype: object
Your list of column names is a list of str
in the utf-8
encoding. But the pandas columns have unicode
strings as names, so the easiest solution is to "decode" your list of column names to unicode
as well.
client_list = [ c.decode("utf8") for c in client_list ]
I can't see into your dataframe but I'll wager that all columns , not just the non-ascii ones, are unicode
strings. The reason the other column names don't give you trouble is that Python 2 does a lot of implicit conversions behind the scenes (and pandas
probably adds some of its own). With ascii strings the mapping between str
and unicode
is trivial, but with non-ascii things it is encoding-dependent. So just convert the entire list of names to unicode. Better yet, migrate all your text handling to unicode, as recommended for any application that sometimes deals with non-ascii data.
A better solution to your predicament would be to switch to Python 3. Its handling of non-ascii encodings is much more intuitive and robust-- you're likely to find that your code will "just work", just like it did for me under Python 3.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.