简体   繁体   中英

how to read in csv file with Chinese characters in python

The csv file I have messy code which is supposed to be chinese characters. I want to read the file into python with the chinese characters not messy as before. How do I do that? I tried pandas.read_csv with encoding like gb2312 or gb18030, they all report error like UnicodeDecodeError: 'gb2312' codec can't decode byte 0xae in position 4: illegal multibyte sequence

My data: 数据

CODE NAME LISTDATE FOUNDDATE TIME DATE EPTTM INDUSTRY LISTCITY 000001.SZ Âπ≥ÂÆâÈì∂Ë°å 3/4/1991 19871222 8 1/1/2007 0.030477768 Ω»⁄∑˛ŒÒ …Ó€⁄ 000002.SZ ‰∏áÁßëA 29/1/1991 19840530 8 1/1/2007 0.025771537 ∑øµÿ≤˙ …Ó€⁄ 000004.SZ ÂõΩÂÜúÁßëÊäÄ 14/1/1991 19860505 8 1/1/2007 -0.05297144 “Ω“©…˙ŒÔ …Ó€⁄ 000005.SZ ‰∏ñÁ∫™ÊòüÊ∫ê 10/12/1990 19870730 8 1/1/2007 -0.024968897 ∑øµÿ≤˙ …Ó€⁄ 000006.SZ Ê∑±Êå؉∏öA 27/4/1992 19850525 8 1/1/2007 0.074647402 ∑øµÿ≤˙ …Ó€⁄ 000007.SZ ÂÖ®Êñ∞•Ω,13/4/1992 19830311 NA 8 1/1/2007 NA ∑øµÿ≤˙ …Ó€⁄ 000008.SZ Á•ûÂ∑ûÈ´òÈìÅ 7/5/1992 19891011 8 1/1/2007 -0.010574387 ◊€∫œ …Ó€⁄ 000009.SZ ‰∏≠ÂõΩÂÆùÂÆâ 25/6/1991 19830706 8 1/1/2007 0.009576133 ∑øµÿ≤˙ …Ó€⁄

data06_16 = pd.read_csv("yourfile.csv", encoding="GBK")

Try adding encoding equals to GBK, it work well.

as the screenshot.

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM