简体   繁体   中英

Python 2.7, unicode - ordinal not in range

I am trying to write a string pulled from an xml file to another file (HTML), but when I try an run the script it gives me this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 124: ordinal not in range(128)

This is the Python code:

f = open('web/tv.html', 'a')
counter = 0
for showname in os.listdir('xml/additional'):
    tree = et.parse('xml/additional/%s/en.xml' % showname)
    root = tree.getroot()
    series = root.find('Series')
    description = series.find('Overview').text
    cell = '\n<tr><td>' + showname + '</td><td>' + description + '</td></tr>'
    f.write(cell)
f.append(u'</table></div></body></html>')

This is a sample of the XML file:

<Series>
  <Overview>From Emmy Award-winner Dan Harmon comes &quot;Community&quot;, a smart comedy series about higher education – and lower expectations. The student body at Greendale Community College is made up of high-school losers, newly divorced housewives, and old people who want to keep their minds active. Within these not-so-hallowed halls, Community focuses on a band of misfits, at the center of which is a fast-talkin' lawyer whose degree has been revoked, who form a study group and end up learning a lot more about themselves than they do about their course work.</Overview>
  <other>stuff</other>
</Series>

Can someone tell me what I'm doing wrong? I find Unicode mightily complicated.

You are mixing Unicode with bytestrings; the XML results are Unicode values with, among other things, an en dash character . The result cannot be written to a plain text file without encoding first.

Encode your description to ASCII text with:

description = description.encode('ascii', 'xmlcharrefreplace')

which uses HTML entities for any codepoint beyond ASCII:

>>> description = u'... a smart comedy series about higher education – and lower expectations.'
>>> description.encode('ascii', 'xmlcharrefreplace')
'... a smart comedy series about higher education &#8211; and lower expectations.'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM