简体   繁体   中英

Convert string from xmlcharrefreplace back to utf-8

I've next part of code:

In [8]: st = u"опа"

In [11]: st.encode("ascii", "xmlcharrefreplace")
Out[11]: 'опа'

In [14]: st1 = st.encode("ascii", "xmlcharrefreplace")

In [15]: st1.decode("ascii", "xmlcharrefreplace")
Out[15]: u'опа'

In [16]: st1.decode("utf-8", "xmlcharrefreplace")
Out[16]: u'опа'

Do you have any idea how to convert st1 back to u"опа" ?

Use the html.unescape() function (Python 3.4 and newer):

>>> import html
>>> html.unescape('опа')
'опа'

On older versions (including Python 2), you'd have to use an instance of HTMLParser.HTMLParser() :

>>> from HTMLParser import HTMLParser
>>> parser = HTMLParser()
>>> parser.unescape('опа')
u'\u043e\u043f\u0430'
>>> print parser.unescape('опа')
опа

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM