逐行讀取 XML 而無需將整個文件加載到 memory

Question

這是我的 XML 的結構：

<?xml version="1.0" encoding="utf-8"?>
<posts>
  <row Id="4" PostTypeId="1" AcceptedAnswerId="7" CreationDate="2008-07-31T21:42:52.667" Score="756" ViewCount="63468" Body="&lt;p&gt;I want to use a &lt;code&gt;Track-Bar&lt;/code&gt; to change a &lt;code&gt;Form&lt;/code&gt;'s opacity.&lt;/p&gt;&#xA;&lt;p&gt;This is my code:&lt;/p&gt;&#xA;&lt;pre class=&quot;lang-cs prettyprint-override&quot;&gt;&lt;code&gt;decimal trans = trackBar1.Value / 5000;&#xA;this.Opacity = trans;&#xA;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;p&gt;When I build the application, it gives the following error:&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;pre class=&quot;lang-none prettyprint-override&quot;&gt;&lt;code&gt;Cannot implicitly convert type decimal to double&#xA;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;p&gt;I have tried using &lt;code&gt;trans&lt;/code&gt; and &lt;code&gt;double&lt;/code&gt;, but then the &lt;code&gt;Control&lt;/code&gt; doesn't work. This code worked fine in a past VB.NET project.&lt;/p&gt;&#xA;" OwnerUserId="8" LastEditorUserId="3072350" LastEditorDisplayName="Rich B" LastEditDate="2021-02-26T03:31:15.027" LastActivityDate="2021-11-15T21:15:29.713" Title="How to convert a Decimal to a Double in C#?" Tags="&lt;c#&gt;&lt;floating-point&gt;&lt;type-conversion&gt;&lt;double&gt;&lt;decimal&gt;" AnswerCount="12" CommentCount="4" FavoriteCount="59" CommunityOwnedDate="2012-10-31T16:42:47.213" ContentLicense="CC BY-SA 4.0" />
  <row Id="6" PostTypeId="1" AcceptedAnswerId="31" CreationDate="2008-07-31T22:08:08.620" Score="313" ViewCount="22477" Body="&lt;p&gt;I have an absolutely positioned &lt;code&gt;div&lt;/code&gt; containing several children, one of which is a relatively positioned &lt;code&gt;div&lt;/code&gt;. When I use a &lt;code&gt;percentage-based width&lt;/code&gt; on the child &lt;code&gt;div&lt;/code&gt;, it collapses to &lt;code&gt;0 width&lt;/code&gt; on IE7, but not on Firefox or Safari.&lt;/p&gt;&#xA;&lt;p&gt;If I use &lt;code&gt;pixel width&lt;/code&gt;, it works. If the parent is relatively positioned, the percentage width on the child works.&lt;/p&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;Is there something I'm missing here?&lt;/li&gt;&#xA;&lt;li&gt;Is there an easy fix for this besides the &lt;code&gt;pixel-based width&lt;/code&gt; on the child?&lt;/li&gt;&#xA;&lt;li&gt;Is there an area of the CSS specification that covers this?&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;" OwnerUserId="9" LastEditorUserId="9134576" LastEditorDisplayName="user14723686" LastEditDate="2021-01-29T18:46:45.963" LastActivityDate="2021-01-29T18:46:45.963" Title="Why did the width collapse in the percentage width child element in an absolutely positioned parent on Internet Explorer 7?" Tags="&lt;html&gt;&lt;css&gt;&lt;internet-explorer-7&gt;" AnswerCount="7" CommentCount="0" FavoriteCount="13" ContentLicense="CC BY-SA 4.0" />
</posts>

我可以row加載而不將整個 XML 文件加載到 memory 中嗎？ 例如打印所有的標題

Answer 1

如果 XML 文件的結構與示例中所示的完全相同，則 BeautifulSoup 可用於解析相關行。 像這樣：

from bs4 import BeautifulSoup as BS
with open('my.xml') as xml:
    for line in map(str.strip, xml):
        if line.startswith('<row'):
            soup = BS(line, 'lxml')
            if row := soup.find('row'):
                if title := row.get('title'):
                    print(title)

Answer 2

XML 中的“行”是無關緊要的； 相關單位是元素、屬性、開始標簽、結束標簽等。

流式解析器（通常稱為 SAX 解析器，盡管嚴格來說 SAX 是一個 Java API）將遞增地向應用程序交付文檔，不是一次一行，而是一次一個語法單元。

參見例如Python SAX 解析器

Answer 3

你可以嘗試這樣的事情：

while line:= file.readline():

Answer 4

是的，你可以使用open() ，它會返回一個文件 object 而不是將文件內容讀入 RAM。 所以你想做這樣的事情：

with open('file_name') as file:
    for row in file:
        print(row)

逐行讀取 XML 而無需將整個文件加載到 memory

問題描述

4 個解決方案

解決方案1
1 已采納 2022-05-05 08:40:43

解決方案2
1 2022-05-05 14:40:37

解決方案3
0 2022-05-05 08:27:37

解決方案4
0 2022-05-05 08:47:27

逐行讀取 XML 而無需將整個文件加載到 memory

問題描述

4 個解決方案

解決方案1 1 已采納 2022-05-05 08:40:43

解決方案2 1 2022-05-05 14:40:37

解決方案3 0 2022-05-05 08:27:37

解決方案4 0 2022-05-05 08:47:27

解決方案1
1 已采納 2022-05-05 08:40:43

解決方案2
1 2022-05-05 14:40:37

解決方案3
0 2022-05-05 08:27:37

解決方案4
0 2022-05-05 08:47:27