How to read in Chinese text and write Chinese characters to csv - Python 3

Question

I've searched SO but have not been able to find the answer to this specific problem. I am trying to read in from a .txt file of Chinese characters. When I try to write to a .csv, the contents of cells look like this:

b'\\xef\\xbb\\xbf\\xe5'

as opposed to:

山西襄汾

How can I output to a .csv the latter format? Snippet of relevant code is below:

infilehandle = open(infilepath, encoding = 'utf-8') # open .txt file
txtlines = infilehandle.read().replace('\n', '')
date_pattern = re.compile('(\d{4}.\d{1,2}.\d{1,2})')
date = date_pattern.findall(txtlines)[0]
title = txtlines.split(date)[0]
localrow = []
localrow.append(date.encode("utf-8-sig"))
localrow.append(title.encode("utf_8_sig"))
outfilehandle.writerow(localrow) # writes to .csv

Answer 1

First, make sure to create outfilehandle with encoding='utf-8' , as suggested by Peter Wood , like so:

outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))

Then there is no need to call date.encode("utf-8-sig") , just change lines 7-8 in your code snippet into:

localrow.append(date)
localrow.append(title)

Also, it may be helpful to read Python Unicode HOWTO and Processing Text Files in Python 3 .

How to read in Chinese text and write Chinese characters to csv - Python 3

Question

1 answers

solution1
0 ACCPTED 2018-11-15 00:04:38

How to read in Chinese text and write Chinese characters to csv - Python 3

Question

1 answers

solution1 0 ACCPTED 2018-11-15 00:04:38

solution1
0 ACCPTED 2018-11-15 00:04:38