簡體   English   中英

ElementTree文本與標簽混合

[英]ElementTree text mixed with tags

想象以下文本:

<description>
the thing <b>stuff</b> is very important for various reasons, notably <b>other things</b>.
</description>

我將如何使用etree接口解析此etree 具有description標簽時, .text屬性僅返回的第一個字- the .getchildren()方法返回<b>元素,但不返回其余文本。

非常感謝!

獲取.text_content() 使用lxml.html工作示例:

from lxml.html import fromstring   

data = """
<description>
the thing <b>stuff</b> is very important for various reasons, notably <b>other things</b>.
</description>
"""

tree = fromstring(data)

print(tree.xpath("//description")[0].text_content().strip())

印刷品:

the thing stuff is very important for various reasons, notably other things.

我忘了指定一件事,對不起。 我理想的分析版本將包含一個小節列表:[normal(“ the something”),bold(“ stuff”),normal(“ ....”)],使用lxml.html庫可以嗎?

假設描述中只有文本節點和b元素:

for item in tree.xpath("//description/*|//description/text()"):
    print([item.strip(), 'normal'] if isinstance(item, basestring) else [item.text, 'bold'])

印刷品:

['the thing', 'normal']
['stuff', 'bold']
['is very important for various reasons, notably', 'normal']
['other things', 'bold']
['.', 'normal']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM