This is how I am getting the data:
page = requests.get('some website')
data = bs4.BeautifulSoup(page.content,"lxml")
I'm using this to do the unescaping:
from xml.sax.saxutils import unescape
html_escape_table = { '"':""", "'":"'"}
html_unescape_table = {v:k for k,v in html_escape_table.items()}
def html_unescape(text):
return unescape(text,html_unescape_table)
When I try to call unescape on any part of data
(which I believe is a string), it doesn't do the unescaping as it should. Instead, it just returns the same string that I called the function with (ex. \è
).
However when I try to call html_unescape()
passing in a string that I physically typed (Ex. html_unescape('\è')
, it works.
Why doesn't it work when I pass in a piece of string from the data I got with BeautifulSoup?
Standard Python would print <type 'str'>
not <class 'str'>
-- you must have received a custom str
class. You'll need to track down where that came from ( requests
? BeautifulSoup
?) and see what operations it supports.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.