简体   繁体   中英

What if Chinese strings shown in the pandas DataFrame in Python

Suppose I have the data below. Even I used #coding=utf-8 to define the default encoding, the output is still showing me : ??? instead of the Chinese string.

#coding=utf-8
import pandas as pd

df = pd.DataFrame({ '日期' : ['2015-01-07', '2014-12-17', '2015-01-21', '2014-11-19', '2015-01-17', '2015-02-26', '2015-01-04', '2014-12-20', '2014-12-07', '2015-01-06'],
                    '股票代码': ['600795', '600268', '002428', '600031', '002736', '600216', '000799', '601600', '601939', '000898']
                    })

print df

Try adding

pd.options.display.encoding = sys.stdout.encoding

near the top of your file. By default, pandas encodes unicode with utf-8 when encoding strings.

Python sets sys.stdout.encoding to the encoding it detects your console or terminal is using.


import sys
import pandas as pd

pd.options.display.encoding = sys.stdout.encoding

df = pd.DataFrame(
    {'日期' : ['2015-01-07', '2014-12-17', '2015-01-21', '2014-11-19', 
               '2015-01-17', '2015-02-26', '2015-01-04', '2014-12-20', 
               '2014-12-07', '2015-01-06'],
     '股票代码': ['600795', '600268', '002428', '600031', '002736', '600216', 
                  '000799', '601600', '601939', '000898']})

print(df)

Note that even though you defined the columns with strings, Pandas converts them to unicode:

In [158]: df.columns
Out[158]: Index([u'日期', u'股票代码'], dtype='object')

This is why when you print(df) Pandas is using pd.options.display.encoding to encode these values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM