简体   繁体   中英

Why doesn't html unescape work here?

This is how I am getting the data:

page = requests.get('some website')
data = bs4.BeautifulSoup(page.content,"lxml")

I'm using this to do the unescaping:

from xml.sax.saxutils import unescape
html_escape_table = { '"':""", "'":"'"}
html_unescape_table = {v:k for k,v in html_escape_table.items()}

def html_unescape(text):
    return unescape(text,html_unescape_table)

When I try to call unescape on any part of data (which I believe is a string), it doesn't do the unescaping as it should. Instead, it just returns the same string that I called the function with (ex. ).

However when I try to call html_unescape() passing in a string that I physically typed (Ex. html_unescape('\è') , it works.

Why doesn't it work when I pass in a piece of string from the data I got with BeautifulSoup?

Standard Python would print <type 'str'> not <class 'str'> -- you must have received a custom str class. You'll need to track down where that came from ( requests ? BeautifulSoup ?) and see what operations it supports.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM