从中删除BOM表字符 <HtmlElement> 蟒蛇

Question

I am trying to load the html markup from a URL this way, and then run some xpath queries, but the page source is loaded with BOM, how to I remove them before I run the xpath? 我正在尝试以这种方式从URL加载html标记，然后运行一些xpath查询，但是页面源已加载BOM，在运行xpath之前如何删除它们？

session = requests.Session()

page = session.get(url)

page_data = lxml.html.fromstring(page.text)

Output: 输出：

 u'Re\ufeffverse \ufeffFleece \ufeffHoo\ufeffded S\ufeffwea\ufefftshi\ufeffrt'

Answer 1

session = requests.Session()

page=session.get(url)

page_data = lxml.html.fromstring(page.text)

float=lxml.html.tostring(page_data).replace('&#65279;', '')

page_data = lxml.html.fromstring(float)

从中删除BOM表字符 <HtmlElement> 蟒蛇

问题描述

1 个解决方案

解决方案1
0 2018-03-26 14:46:46

从中删除BOM表字符 <HtmlElement> 蟒蛇

问题描述

1 个解决方案

解决方案1 0 2018-03-26 14:46:46

解决方案1
0 2018-03-26 14:46:46