简体   繁体   English

如何使用python使Escape Sequence to Character Entities

[英]How to use the python make the Escape Sequence to Character Entities

I am a fresh pythoner, thank for help me. 我是一个新鲜的蟒蛇,谢谢你的帮助。 I just want to make the Escape Sequence to Character Entities, like the &lt; 我只想制作Escape Sequence to Character Entities,比如&lt; change to < , but one HTML page have many different Escape Sequence, I can not write many replace statement,like: 改为< ,但是一个HTML页面有很多不同的转义序列,我不能写很多替换语句,如:

str = str.replace('&nbsp;', ' ')

...............many code.........

str = str.replace('&lt;', '<')
str = str.replace('&gt;', '>')

It is so long....I just want to have a fun or def, that can make the problem easily. 它太长了......我只想拥有一个有趣或def,这可以轻松解决问题。 Thank you very much 非常感谢你

Use HTMLParser.HTMLParser : 使用HTMLParser.HTMLParser

>>> from HTMLParser import HTMLParser
>>> # from html.parser import HTMLParser # In Python 3.x
>>> 
>>> parser = HTMLParser()
>>> parser.unescape('&gt;_&lt;')
u'>_<'
>>> parser.unescape('&#48;&#49;&#x32;')
u'012'

NOTE : HTMLParser.unescape('&nbsp;') returns NO-BREAK SPACE (U+00A0) instead of SPACE. 注意HTMLParser.unescape('&nbsp;')返回NO-BREAK SPACE(U + 00A0)而不是SPACE。

>>> parser.unescape('&nbsp;')
u'\xa0'

BTW, Don't use str as a variable name, it shadows a builtin function str . 顺便说一句,不要使用str作为变量名,它会影响内置函数str

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM