如何使用 lxml 通过文本查找元素？

Question

Assume we have the following html:假设我们有以下 html：

<html>
    <body>
        <a href="/1234.html">TEXT A</a>
        <a href="/3243.html">TEXT B</a>
        <a href="/7445.html">TEXT C</a>
    <body>
</html>

How do I make it find the element "a", which contains "TEXT A"?如何让它找到包含“TEXT A”的元素“a”？

So far I've got:到目前为止，我有：

root = lxml.html.document_fromstring(the_html_above)
e = root.find('.//a')

I've tried:我试过了：

e = root.find('.//a[@text="TEXT A"]')

but that didn't work, as the "a" tags have no attribute "text".但这不起作用，因为“a”标签没有“text”属性。

Is there any way I can solve this in a similar fashion to what I've tried?有什么办法可以以与我尝试过的类似的方式解决这个问题吗？

Answer 1

You are very close.你很亲近。 Use text()= rather than @text (which indicates an attribute).使用text()=而不是@text （表示属性）。

e = root.xpath('.//a[text()="TEXT A"]')

Or, if you know only that the text contains "TEXT A",或者，如果您只知道文本包含“TEXT A”，

e = root.xpath('.//a[contains(text(),"TEXT A")]')

Or, if you know only that text starts with "TEXT A",或者，如果您只知道文本以“TEXT A”开头，

e = root.xpath('.//a[starts-with(text(),"TEXT A")]')

See the docs for more on the available string functions.有关可用字符串函数的更多信息，请参阅文档。

For example,例如，

import lxml.html as LH

text = '''\
<html>
    <body>
        <a href="/1234.html">TEXT A</a>
        <a href="/3243.html">TEXT B</a>
        <a href="/7445.html">TEXT C</a>
    <body>
</html>'''

root = LH.fromstring(text)
e = root.xpath('.//a[text()="TEXT A"]')
print(e)

yields产量

[<Element a at 0xb746d2cc>]

Answer 2

Another way that looks more straightforward to me:另一种对我来说看起来更直接的方法：

results = []
root = lxml.hmtl.fromstring(the_html_above)
for tag in root.iter():
    if "TEXT A" in tag.text
        results.append(tag)

如何使用 lxml 通过文本查找元素？

问题描述

2 个解决方案

解决方案1
51 已采纳 2013-01-13 02:14:37

解决方案2
7 2013-07-20 17:21:53

如何使用 lxml 通过文本查找元素？

问题描述

2 个解决方案

解决方案1 51 已采纳 2013-01-13 02:14:37

解决方案2 7 2013-07-20 17:21:53

解决方案1
51 已采纳 2013-01-13 02:14:37

解决方案2
7 2013-07-20 17:21:53