ElementTree文本與標簽混合

Question

想象以下文本：

<description>
the thing <b>stuff</b> is very important for various reasons, notably <b>other things</b>.
</description>

我將如何使用etree接口解析此etree ？ 具有description標簽時， .text屬性僅返回的第一個字- the 。 .getchildren()方法返回<b>元素，但不返回其余文本。

非常感謝！

Answer 1

獲取.text_content() 。 使用lxml.html工作示例：

from lxml.html import fromstring   

data = """
<description>
the thing <b>stuff</b> is very important for various reasons, notably <b>other things</b>.
</description>
"""

tree = fromstring(data)

print(tree.xpath("//description")[0].text_content().strip())

印刷品：

the thing stuff is very important for various reasons, notably other things.

我忘了指定一件事，對不起。 我理想的分析版本將包含一個小節列表：[normal（“ the something”），bold（“ stuff”），normal（“ ....”）]，使用lxml.html庫可以嗎？

假設描述中只有文本節點和b元素：

for item in tree.xpath("//description/*|//description/text()"):
    print([item.strip(), 'normal'] if isinstance(item, basestring) else [item.text, 'bold'])

印刷品：

['the thing', 'normal']
['stuff', 'bold']
['is very important for various reasons, notably', 'normal']
['other things', 'bold']
['.', 'normal']

ElementTree文本與標簽混合

問題描述

1 個解決方案

解決方案1
1 已采納 2015-12-16 18:12:16

ElementTree文本與標簽混合

問題描述

1 個解決方案

解決方案1 1 已采納 2015-12-16 18:12:16

解決方案1
1 已采納 2015-12-16 18:12:16