Python：提取除某些標簽外的 XML 文本

Question

我有這個示例 XML 文件：

<page>
  <title>Chapter 1</title>
  <content>Welcome to Chapter 1</content>
  <author>John Smith</author>
</page>
<page>
 <title>Chapter 2</title>
 <content>Welcome to Chapter 2</content>
 <author>John Doe</author>
</page>

這個 XML 可能有多個級別（即超過 2 個）並且可能有其他標簽。 我希望提取除“內容”標簽下的文本之外的所有文本，以便獲得如下字符串列表：

['Chapter 1', 'John Smith', 'Chapter 2', 'John Doe']

我正在使用 ElementTree 執行此任務。 有沒有優雅、干凈的解決方案？

Answer 1

import bs4

xml = '''<page>
  <title>Chapter 1</title>
  <content>Welcome to Chapter 1</content>
  <author>John Smith</author>
</page>
<page>
 <title>Chapter 2</title>
 <content>Welcome to Chapter 2</content>
 <author>John Doe</author>
</page>'''

soup = bs4.BeautifulSoup(xml, 'lxml')
[(page.title.text, page.author.text)for page in soup('page')]

出去：

[('Chapter 1', 'John Smith'), ('Chapter 2', 'John Doe')]

使用 BeautifulSoup 作為 XML 解析器，可以參考Document

Python：提取除某些標簽外的 XML 文本

問題描述

1 個解決方案

解決方案1
-1 2017-01-23 05:39:09

Python：提取除某些標簽外的 XML 文本

問題描述

1 個解決方案

解決方案1 -1 2017-01-23 05:39:09

解決方案1
-1 2017-01-23 05:39:09