简体   繁体   中英

How to convert special characters into html entities?

I want to convert, in python, special characters like "%$!&@á é ©" and not only '<&">' as all the documentation and references I've found so far shows. cgi.escape doesn't solve the problem.

For example, the string "á ê ĩ &" should be converted to "&aacute; &ecirc; &itilde; &amp;" .

Does anyboy know how to solve it? I'm using python 2.6.

You could build your own loop using the dictionaries you can find in http://docs.python.org/library/htmllib.html#module-htmlentitydefs

The one you're looking for is htmlentitydefs.codepoint2name

I found a built in solution searching for the htmlentitydefs.codepoint2name that @Ruben Vermeersch said in his answer. The solution was found here: http://bytes.com/topic/python/answers/594350-convert-unicode-chars-html-entities

Here's the function:

def htmlescape(text):
    text = (text).decode('utf-8')

    from htmlentitydefs import codepoint2name
    d = dict((unichr(code), u'&%s;' % name) for code,name in codepoint2name.iteritems() if code!=38) # exclude "&"    
    if u"&" in text:
        text = text.replace(u"&", u"&amp;")
    for key, value in d.iteritems():
        if key in text:
            text = text.replace(key, value)
    return text

Thank you all for helping! ;)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM