I am trying to write a string pulled from an xml file to another file (HTML), but when I try an run the script it gives me this error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 124: ordinal not in range(128)
This is the Python code:
f = open('web/tv.html', 'a')
counter = 0
for showname in os.listdir('xml/additional'):
tree = et.parse('xml/additional/%s/en.xml' % showname)
root = tree.getroot()
series = root.find('Series')
description = series.find('Overview').text
cell = '\n<tr><td>' + showname + '</td><td>' + description + '</td></tr>'
f.write(cell)
f.append(u'</table></div></body></html>')
This is a sample of the XML file:
<Series>
<Overview>From Emmy Award-winner Dan Harmon comes "Community", a smart comedy series about higher education – and lower expectations. The student body at Greendale Community College is made up of high-school losers, newly divorced housewives, and old people who want to keep their minds active. Within these not-so-hallowed halls, Community focuses on a band of misfits, at the center of which is a fast-talkin' lawyer whose degree has been revoked, who form a study group and end up learning a lot more about themselves than they do about their course work.</Overview>
<other>stuff</other>
</Series>
Can someone tell me what I'm doing wrong? I find Unicode mightily complicated.
You are mixing Unicode with bytestrings; the XML results are Unicode values with, among other things, an en dash character . The result cannot be written to a plain text file without encoding first.
Encode your description
to ASCII text with:
description = description.encode('ascii', 'xmlcharrefreplace')
which uses HTML entities for any codepoint beyond ASCII:
>>> description = u'... a smart comedy series about higher education – and lower expectations.'
>>> description.encode('ascii', 'xmlcharrefreplace')
'... a smart comedy series about higher education – and lower expectations.'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.