使用 xpath 到 select 两者：（p元素中的文本）和（作为p的子元素的img中的属性'src'）

Question

我需要在此 html 中提取元素“img”的属性“src”中的文本和链接

<div>

    <p> Line 1</p>

    <p>
        <img @src="https://example.com/abc">
    </p>

    <p> Line 2</p>

</div>

我想要的 output 是：


# they must be in the correct order like this

[Line 1, https://example.com/abc, Line 2]

我尝试了几种方法，但都失败了：

xpath: //p/text() | //p/img/@src
Ouput: [ Line 1, Line 2, https://example.com/abc ]

Failed because the results were in the wrong order

xpath: //p/( text(), img/@src)
Output: xpath not valid

Answer 1

假设您希望删除所有前导和尾随空格（基于您的预期结果），并且您不希望 p 中包含 img 的任何文本内容（您在 img 之前/之后有很多空格），那么您可以尝试如下表达式：

//div/p/(normalize-space(string()), string(img/@src))[string-length(.)>0]

应该通过将节点和属性字符串内容组合成一系列原子字符串，然后只选择长度大于零的字符串来保持顺序。

normalize-space(string()) 将 p 元素的节点内容选择为字符串，同时消除前导和尾随空格。 string(img/@src) 正在选择作为 p 的直接子元素的任何图像元素的 src 属性的内容。 [string-length(.)>0] 是形成序列的谓词，消除任何长度为零的字符串。

使用 xpath 到 select 两者：（p元素中的文本）和（作为p的子元素的img中的属性'src'）

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-06-02 15:38:17

使用 xpath 到 select 两者：（p元素中的文本）和（作为p的子元素的img中的属性'src'）

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-06-02 15:38:17

解决方案1
1 已采纳 2021-06-02 15:38:17