简体   繁体   中英

How to read in Chinese text and write Chinese characters to csv - Python 3

I've searched SO but have not been able to find the answer to this specific problem. I am trying to read in from a .txt file of Chinese characters. When I try to write to a .csv, the contents of cells look like this:

b'\\xef\\xbb\\xbf\\xe5'

as opposed to:

山西襄汾

How can I output to a .csv the latter format? Snippet of relevant code is below:

infilehandle = open(infilepath, encoding = 'utf-8') # open .txt file
txtlines = infilehandle.read().replace('\n', '')
date_pattern = re.compile('(\d{4}.\d{1,2}.\d{1,2})')
date = date_pattern.findall(txtlines)[0]
title = txtlines.split(date)[0]
localrow = []
localrow.append(date.encode("utf-8-sig"))
localrow.append(title.encode("utf_8_sig"))
outfilehandle.writerow(localrow) # writes to .csv

First, make sure to create outfilehandle with encoding='utf-8' , as suggested by Peter Wood , like so:

outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))

Then there is no need to call date.encode("utf-8-sig") , just change lines 7-8 in your code snippet into:

localrow.append(date)
localrow.append(title)

Also, it may be helpful to read Python Unicode HOWTO and Processing Text Files in Python 3 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM