I've searched SO but have not been able to find the answer to this specific problem. I am trying to read in from a .txt file of Chinese characters. When I try to write to a .csv, the contents of cells look like this:
b'\\xef\\xbb\\xbf\\xe5'
as opposed to:
山西襄汾
How can I output to a .csv the latter format? Snippet of relevant code is below:
infilehandle = open(infilepath, encoding = 'utf-8') # open .txt file
txtlines = infilehandle.read().replace('\n', '')
date_pattern = re.compile('(\d{4}.\d{1,2}.\d{1,2})')
date = date_pattern.findall(txtlines)[0]
title = txtlines.split(date)[0]
localrow = []
localrow.append(date.encode("utf-8-sig"))
localrow.append(title.encode("utf_8_sig"))
outfilehandle.writerow(localrow) # writes to .csv
First, make sure to create outfilehandle
with encoding='utf-8'
, as suggested by Peter Wood , like so:
outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))
Then there is no need to call date.encode("utf-8-sig")
, just change lines 7-8 in your code snippet into:
localrow.append(date)
localrow.append(title)
Also, it may be helpful to read Python Unicode HOWTO and Processing Text Files in Python 3 .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.