使用 Python 解析 XML 並打印整個元素

Question

這是我在這里的第一個問題。 通常我可以找到我需要的東西，但經過一周的搜索和嘗試后，我在同一個地方，所以我需要你的幫助。

我有一本書在一個超過 6000 行的大型 XML 文件中。 我需要做的是獲取一個元素<sec>並將其內容放入一個字符串中。 有時該元素只有一個段落，有時它有更多，有時段落有列表等等，我需要將所有內容都記錄在一個字符串中。

這是如何格式化書籍的示例。

<book>
    <book-body>
        <book-part id="ch01" book-part-type="chapter">
            <book-part-meta>
                <title-group>
                    <label><target target-type="page" id="pg1"/>Chapter 1</label>
                    <title>Some Title</title>
                </title-group>
            </book-part-meta>
            <body>
                <sec id="ch01lev1sec1" disp-level="level1">
                    <title>Introduction</title>
                    <p>This is a <em>paragraph</em></p>
                    <p>This is second paragraph
                        <list list-type="bullet">
                        <list-item><p>List Item 1</p></list-item>
                        <list-item><p>List Item 2</p></list-item>
                        <list-item><p>List Item 3</p></list-item>
                        </list>
                    </p>
                </sec>
            </body>
        </book-part>
    </book-body>
</book>

從這個例子中，我需要標簽中的所有內容（理想情況下沒有標題，但我稍后會弄清楚）。 我曾嘗試使用“xml.etree.ElementTree”和“minidom”但沒有成功。

這是我使用 minidom 的代碼示例

from xml.dom import minidom

xmldoc = minidom.parse("xCHES.xml")

book = xmldoc.getElementsByTagName("book")[0]

sec = book.getElementsByTagName("sec")

當我列出一些元素時，我得到的數字與我在 xml 文件中搜索“ <sec ”時得到的數字相同，所以我想我得到了所有元素。 在這一點之后，我被卡住了，我找不到如何將所有內容提取為文本的方法。

“ElementTree”也是一樣，我可以找到所有<sec>元素，但我無法提取文本，或者只是提取其中的一小部分。

所以如果有人能幫我解決這個問題，那就太好了。 用什么方法無所謂，只要能完成任務就行。

編輯：所需的輸出是

<title>Introduction</title>
<p>This is a <em>paragraph</em></p>
<p>This is second paragraph
    <list list-type="bullet">
    <list-item><p>List Item 1</p></list-item>
    <list-item><p>List Item 2</p></list-item>
    <list-item><p>List Item 3</p></list-item>
    </list>
</p>

但作為刺痛。 這可能在一行中，格式無關緊要。

謝謝：）

Answer 1

遵循@stovfl 關於如何使用 xml.dom 中的 minidom 將內部內容作為字符串獲取的答案？

也許這對你有用？

def getText(nodelist):
    # Iterate all Nodes aggregate TEXT_NODE
    rc = []
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            rc.append(node.data)
        else:
            # Recursive
            rc.append(getText(node.childNodes))
    return ''.join(rc)


# Iterate <sec..>...</sec> Node List
for node in nodelist:
    print(getText(node.childNodes))

輸出：

                Introduction
                This is a paragraph
                This is second paragraph

                    List Item 1
                    List Item 2
                    List Item 3

Answer 2

我認為 BeautifulSoup 可以讓你的工作更輕松..

嘗試這個..

新建文件

<book>
    <book-body>
        <book-part id="ch01" book-part-type="chapter">
            <book-part-meta>
                <title-group>
                    <label><target target-type="page" id="pg1"/>Chapter 1</label>
                    <title>Some Title</title>
                </title-group>
            </book-part-meta>
            <body>
                <sec id="ch01lev1sec1" disp-level="level1">
                    <title>Introduction</title>
                    <p>This is a paragraph</p>
                    <p>This is second paragraph
                        <list list-type="bullet">
                        <list-item><p>List Item 1</p></list-item>
                        <list-item><p>List Item 2</p></list-item>
                        <list-item><p>List Item 3</p></list-item>
                        </list>
                    </p>
                </sec>
            </body>
        </book-part>
    </book-body>
</book>

代碼

data = BeautifulSoup(open('new.xml', 'r')) #new.xml file contains the xml data
data.find_all('sec')

輸出看起來像

[<sec disp-level="level1" id="ch01lev1sec1">
 <title>Introduction</title>
 <p>This is a paragraph</p>
 <p>This is second paragraph
                         <list list-type="bullet">
 <list-item><p>List Item 1</p></list-item>
 <list-item><p>List Item 2</p></list-item>
 <list-item><p>List Item 3</p></list-item>
 </list>
 </p>
 </sec>]

我認為您可以在此之后輕松解析。 如果您需要解析方面的幫助，請 Ping

使用 Python 解析 XML 並打印整個元素

問題描述

2 個解決方案

解決方案1
2 2018-08-20 14:31:19

解決方案2
0 2018-08-20 14:04:51

使用 Python 解析 XML 並打印整個元素

問題描述

2 個解決方案

解決方案1 2 2018-08-20 14:31:19

解決方案2 0 2018-08-20 14:04:51

解決方案1
2 2018-08-20 14:31:19

解決方案2
0 2018-08-20 14:04:51