如何通过使用lxml.etree python中的类名来解析html

Question

req = requests.get(url)
tree = etree.HTML(req.text)

now instead of using xpath tree.xpath(...) I would like to know if we can search by class name of id as we do in beautifulSoup soup.find('div',attrs={'class':'myclass'}) I'm looking for something similar in lxml. 现在而不是使用xpath tree.xpath(...)我想知道是否可以像在beautifulSoup soup.find('div',attrs={'class':'myclass'})我正在寻找在LXML类似的东西。

Answer 1

The far more concise way to do that in bs4 is to use a css selector: 在bs4更简单的方法是使用css选择器：

soup.select('div.myclass') #  == soup.find_all('div',attrs={'class':'myclass'})

lxml provides cssselect as a module (which actually compiles XPath expressions ) and as a convenience method on Element objects. lxml提供cssselect作为模块（实际上是编译XPath表达式）以及作为Element对象的便捷方法。

import lxml.html

tree = lxml.html.fromstring(req.text)
for div in tree.cssselect('div.myclass'):
    #stuff

Or optionally you can pre-compile the expression and apply that to your Element : 或者，您可以预编译该表达式并将其应用于您的Element ：

from lxml.cssselect import CSSSelector
selector = CSSSelector('div.myclass')

selection = selector(tree)

Answer 2

You say that you don't want to use xpath but don't explain why. 您说您不想使用xpath，但不解释原因。 If the goal is to search for a tag with a given class, you can do that easily with xpath. 如果目标是使用给定的类搜索标签，则可以使用xpath轻松实现。

For example, to find a div with the class "foo" you could do something like this: 例如，要查找具有“ foo”类的div，可以执行以下操作：

tree.find("//div[@class='foo']")

如何通过使用lxml.etree python中的类名来解析html

问题描述

2 个解决方案

解决方案1
1 2014-05-12 17:45:44

解决方案2
1 2016-01-28 18:27:07

如何通过使用lxml.etree python中的类名来解析html

问题描述

2 个解决方案

解决方案1 1 2014-05-12 17:45:44

解决方案2 1 2016-01-28 18:27:07

解决方案1
1 2014-05-12 17:45:44

解决方案2
1 2016-01-28 18:27:07