[英]How do I select an element with the exact class using cssselect in lxml?
I´m scraping a web with lxml html, but I´m getting a problem. 我正在使用lxml html抓取网站,但是遇到了问题。 When I make a selection of HTML for example:
例如,当我选择HTML时:
html.cssselect('a.asig')
I must get the elements with class="asig" but the selection also prints the elements that contains "asig" in his id for example: 我必须使用class =“ asig”来获取元素,但是选择还会打印出其id中包含“ asig”的元素,例如:
<a class="asig drcha" ...>
What could I do for get only the elements with "asig" and not the elements that contains asig? 我该怎么做才能只获取带有“ asig”的元素,而不获取包含asig的元素? Thanks!
谢谢!
Use either html.xpath
and adjust accordingly, or be very implicit when declaring the class to locate. 使用
html.xpath
并进行相应调整,或者在声明要定位的类时使用非常隐式的形式。 See the following code. 请参阅以下代码。
from lxml import html
sample = '<?xml version="1.0" encoding="UTF-8"?><root><a class="asig">I am the correct one.</a><a class="asig drcha">I am the wrong one.</a></root>'
tree = html.fromstring(sample)
print tree.xpath("//a[@class='asig']/text()")[0]
print tree.cssselect("a[class='asig']")[0].text
Result is as follows: 结果如下:
I am the correct one.
I am the correct one.
[Finished in 0.2s]
Notice how cssselect
was used in the last line. 注意最后一行中如何使用
cssselect
。 Hope this helps. 希望这可以帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.