[英]ElementTree text mixed with tags
想象以下文本:
<description>
the thing <b>stuff</b> is very important for various reasons, notably <b>other things</b>.
</description>
我將如何使用etree
接口解析此etree
? 具有description
標簽時, .text
屬性僅返回的第一個字- the
。 .getchildren()
方法返回<b>
元素,但不返回其余文本。
非常感謝!
獲取.text_content()
。 使用lxml.html
工作示例:
from lxml.html import fromstring
data = """
<description>
the thing <b>stuff</b> is very important for various reasons, notably <b>other things</b>.
</description>
"""
tree = fromstring(data)
print(tree.xpath("//description")[0].text_content().strip())
印刷品:
the thing stuff is very important for various reasons, notably other things.
我忘了指定一件事,對不起。 我理想的分析版本將包含一個小節列表:[normal(“ the something”),bold(“ stuff”),normal(“ ....”)],使用lxml.html庫可以嗎?
假設描述中只有文本節點和b
元素:
for item in tree.xpath("//description/*|//description/text()"):
print([item.strip(), 'normal'] if isinstance(item, basestring) else [item.text, 'bold'])
印刷品:
['the thing', 'normal']
['stuff', 'bold']
['is very important for various reasons, notably', 'normal']
['other things', 'bold']
['.', 'normal']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.