简体   繁体   中英

Removing BOM characters from <HtmlElement> Python

I am trying to load the html markup from a URL this way, and then run some xpath queries, but the page source is loaded with BOM, how to I remove them before I run the xpath?

session = requests.Session()

page = session.get(url)

page_data = lxml.html.fromstring(page.text)

Output:

 u'Re\ufeffverse \ufeffFleece \ufeffHoo\ufeffded S\ufeffwea\ufefftshi\ufeffrt'
session = requests.Session()

page=session.get(url)

page_data = lxml.html.fromstring(page.text)

float=lxml.html.tostring(page_data).replace('&#65279;', '')

page_data = lxml.html.fromstring(float)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM