Why doesn't html unescape work here?

Question

This is how I am getting the data:

page = requests.get('some website')
data = bs4.BeautifulSoup(page.content,"lxml")

I'm using this to do the unescaping:

from xml.sax.saxutils import unescape
html_escape_table = { '"':"&quot;", "'":"&apos;"}
html_unescape_table = {v:k for k,v in html_escape_table.items()}

def html_unescape(text):
    return unescape(text,html_unescape_table)

When I try to call unescape on any part of data (which I believe is a string), it doesn't do the unescaping as it should. Instead, it just returns the same string that I called the function with (ex. \è ).

However when I try to call html_unescape() passing in a string that I physically typed (Ex. html_unescape('\è') , it works.

Why doesn't it work when I pass in a piece of string from the data I got with BeautifulSoup?

Answer 1

Standard Python would print <type 'str'> not <class 'str'> -- you must have received a custom str class. You'll need to track down where that came from ( requests ? BeautifulSoup ?) and see what operations it supports.

Why doesn't html unescape work here?

Question

1 answers

solution1
0 ACCPTED 2015-12-24 21:55:41

Why doesn't html unescape work here?

Question

1 answers

solution1 0 ACCPTED 2015-12-24 21:55:41

solution1
0 ACCPTED 2015-12-24 21:55:41