[英]How to use lxml to find an element by text?
Assume we have the following html:假设我们有以下 html:
<html>
<body>
<a href="/1234.html">TEXT A</a>
<a href="/3243.html">TEXT B</a>
<a href="/7445.html">TEXT C</a>
<body>
</html>
How do I make it find the element "a", which contains "TEXT A"?如何让它找到包含“TEXT A”的元素“a”?
So far I've got:到目前为止,我有:
root = lxml.html.document_fromstring(the_html_above)
e = root.find('.//a')
I've tried:我试过了:
e = root.find('.//a[@text="TEXT A"]')
but that didn't work, as the "a" tags have no attribute "text".但这不起作用,因为“a”标签没有“text”属性。
Is there any way I can solve this in a similar fashion to what I've tried?有什么办法可以以与我尝试过的类似的方式解决这个问题吗?
You are very close.你很亲近。 Use
text()=
rather than @text
(which indicates an attribute).使用
text()=
而不是@text
(表示属性)。
e = root.xpath('.//a[text()="TEXT A"]')
Or, if you know only that the text contains "TEXT A",或者,如果您只知道文本包含“TEXT A”,
e = root.xpath('.//a[contains(text(),"TEXT A")]')
Or, if you know only that text starts with "TEXT A",或者,如果您只知道文本以“TEXT A”开头,
e = root.xpath('.//a[starts-with(text(),"TEXT A")]')
See the docs for more on the available string functions.有关可用字符串函数的更多信息,请参阅文档。
For example,例如,
import lxml.html as LH
text = '''\
<html>
<body>
<a href="/1234.html">TEXT A</a>
<a href="/3243.html">TEXT B</a>
<a href="/7445.html">TEXT C</a>
<body>
</html>'''
root = LH.fromstring(text)
e = root.xpath('.//a[text()="TEXT A"]')
print(e)
yields产量
[<Element a at 0xb746d2cc>]
Another way that looks more straightforward to me:另一种对我来说看起来更直接的方法:
results = []
root = lxml.hmtl.fromstring(the_html_above)
for tag in root.iter():
if "TEXT A" in tag.text
results.append(tag)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.