Python Converting Characters from Unicode to HTML

Question

Hey guys I am trying to convert this in python 2.7.3:

the+c\xf8\xf8n

to the html string:

the+c%C3%B8%C3%B8n

It was original the c\\xf8\\xf8n but I did use a replace to use a + instead of the space.

I'm not entirely sure what convention the latter is I would use string replace but the convention changes by the different characters..

Thoughts? Thanks guys

Answer 1

You are URL encoding, not HTML. Use urllib.quote :

from urllib import quote

but make sure you encode to UTF-8 first:

quote(inputstring.encode('utf8'))

This will quote the + explicitly; if you meant that to be a space character, you need to mark that as safe:

quote(inputstring.encode('utf8'), '+')

The latter form gives:

>>> quote(inputstring.encode('utf8'), '+')
'the+c%C3%B8%C3%B8n'