[英]how to parse html by using class name in lxml.etree python
req = requests.get(url)
tree = etree.HTML(req.text)
now instead of using xpath tree.xpath(...)
I would like to know if we can search by class name of id as we do in beautifulSoup soup.find('div',attrs={'class':'myclass'})
I'm looking for something similar in lxml. 现在而不是使用xpath
tree.xpath(...)
我想知道是否可以像在beautifulSoup soup.find('div',attrs={'class':'myclass'})
我正在寻找在LXML类似的东西。
The far more concise way to do that in bs4
is to use a css selector: 在
bs4
更简单的方法是使用css选择器:
soup.select('div.myclass') # == soup.find_all('div',attrs={'class':'myclass'})
lxml
provides cssselect
as a module (which actually compiles XPath expressions ) and as a convenience method on Element
objects. lxml
提供cssselect
作为模块(实际上是编译XPath表达式 )以及作为Element
对象的便捷方法。
import lxml.html
tree = lxml.html.fromstring(req.text)
for div in tree.cssselect('div.myclass'):
#stuff
Or optionally you can pre-compile the expression and apply that to your Element
: 或者,您可以预编译该表达式并将其应用于您的
Element
:
from lxml.cssselect import CSSSelector
selector = CSSSelector('div.myclass')
selection = selector(tree)
You say that you don't want to use xpath but don't explain why. 您说您不想使用xpath,但不解释原因。 If the goal is to search for a tag with a given class, you can do that easily with xpath.
如果目标是使用给定的类搜索标签,则可以使用xpath轻松实现。
For example, to find a div with the class "foo" you could do something like this: 例如,要查找具有“ foo”类的div,可以执行以下操作:
tree.find("//div[@class='foo']")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.