Decoding html encoded strings in python

Question

I have the following string...

"Scam, hoax, or the real deal, he&#8217;s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process."

I need to turn it into this string...

Scam, hoax, or the real deal, he's gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process.

This is pretty standard HTML encoding and I can't for the life of me figure out how to convert it in python.

I found this: GitHub

And it's very close to working, however it does not output an apostrophe but instead some off unicode character.

Here is an example of the output from the GitHub script...

Scam, hoax, or the real deal, heâs gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process.

Answer 1

What's you're trying to do is called "HTML entity decoding" and it's covered in a number of past Stack Overflow questions, for example:

Here's a code snippet using the Beautiful Soup HTML parsing library to decode your example:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup

string = "Scam, hoax, or the real deal, he&#8217;s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process."
s = BeautifulSoup(string,convertEntities=BeautifulSoup.HTML_ENTITIES).contents[0]
print s

Here's the output:

Scam, hoax, or the real deal, he's gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process.

Decoding html encoded strings in python

Question

1 answers

solution1
4 ACCPTED 2009-05-27 04:49:52

Decoding html encoded strings in python

Question

1 answers

solution1 4 ACCPTED 2009-05-27 04:49:52

solution1
4 ACCPTED 2009-05-27 04:49:52