简体   繁体   English

如何在lxml中使用cssselect选择具有完全类的元素?

[英]How do I select an element with the exact class using cssselect in lxml?

I´m scraping a web with lxml html, but I´m getting a problem. 我正在使用lxml html抓取网站,但是遇到了问题。 When I make a selection of HTML for example: 例如,当我选择HTML时:

 html.cssselect('a.asig')

I must get the elements with class="asig" but the selection also prints the elements that contains "asig" in his id for example: 我必须使用class =“ asig”来获取元素,但是选择还会打印出其id中包含“ asig”的元素,例如:

<a class="asig drcha" ...>

What could I do for get only the elements with "asig" and not the elements that contains asig? 我该怎么做才能只获取带有“ asig”的元素,而不获取包含asig的元素? Thanks! 谢谢!

Use either html.xpath and adjust accordingly, or be very implicit when declaring the class to locate. 使用html.xpath并进行相应调整,或​​者在声明要定位的类时使用非常隐式的形式。 See the following code. 请参阅以下代码。

from lxml import html

sample = '<?xml version="1.0" encoding="UTF-8"?><root><a class="asig">I am the correct one.</a><a class="asig drcha">I am the wrong one.</a></root>'
tree = html.fromstring(sample)
print tree.xpath("//a[@class='asig']/text()")[0]
print tree.cssselect("a[class='asig']")[0].text

Result is as follows: 结果如下:

I am the correct one.
I am the correct one.
[Finished in 0.2s]

Notice how cssselect was used in the last line. 注意最后一行中如何使用cssselect Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM