简体   繁体   English

逐行读取 XML 而无需将整个文件加载到 memory

[英]Read XML line by line without loading whole file to memory

This is structure of my XML:这是我的 XML 的结构:

<?xml version="1.0" encoding="utf-8"?>
<posts>
  <row Id="4" PostTypeId="1" AcceptedAnswerId="7" CreationDate="2008-07-31T21:42:52.667" Score="756" ViewCount="63468" Body="&lt;p&gt;I want to use a &lt;code&gt;Track-Bar&lt;/code&gt; to change a &lt;code&gt;Form&lt;/code&gt;'s opacity.&lt;/p&gt;&#xA;&lt;p&gt;This is my code:&lt;/p&gt;&#xA;&lt;pre class=&quot;lang-cs prettyprint-override&quot;&gt;&lt;code&gt;decimal trans = trackBar1.Value / 5000;&#xA;this.Opacity = trans;&#xA;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;p&gt;When I build the application, it gives the following error:&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;pre class=&quot;lang-none prettyprint-override&quot;&gt;&lt;code&gt;Cannot implicitly convert type decimal to double&#xA;&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;p&gt;I have tried using &lt;code&gt;trans&lt;/code&gt; and &lt;code&gt;double&lt;/code&gt;, but then the &lt;code&gt;Control&lt;/code&gt; doesn't work. This code worked fine in a past VB.NET project.&lt;/p&gt;&#xA;" OwnerUserId="8" LastEditorUserId="3072350" LastEditorDisplayName="Rich B" LastEditDate="2021-02-26T03:31:15.027" LastActivityDate="2021-11-15T21:15:29.713" Title="How to convert a Decimal to a Double in C#?" Tags="&lt;c#&gt;&lt;floating-point&gt;&lt;type-conversion&gt;&lt;double&gt;&lt;decimal&gt;" AnswerCount="12" CommentCount="4" FavoriteCount="59" CommunityOwnedDate="2012-10-31T16:42:47.213" ContentLicense="CC BY-SA 4.0" />
  <row Id="6" PostTypeId="1" AcceptedAnswerId="31" CreationDate="2008-07-31T22:08:08.620" Score="313" ViewCount="22477" Body="&lt;p&gt;I have an absolutely positioned &lt;code&gt;div&lt;/code&gt; containing several children, one of which is a relatively positioned &lt;code&gt;div&lt;/code&gt;. When I use a &lt;code&gt;percentage-based width&lt;/code&gt; on the child &lt;code&gt;div&lt;/code&gt;, it collapses to &lt;code&gt;0 width&lt;/code&gt; on IE7, but not on Firefox or Safari.&lt;/p&gt;&#xA;&lt;p&gt;If I use &lt;code&gt;pixel width&lt;/code&gt;, it works. If the parent is relatively positioned, the percentage width on the child works.&lt;/p&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;Is there something I'm missing here?&lt;/li&gt;&#xA;&lt;li&gt;Is there an easy fix for this besides the &lt;code&gt;pixel-based width&lt;/code&gt; on the child?&lt;/li&gt;&#xA;&lt;li&gt;Is there an area of the CSS specification that covers this?&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;" OwnerUserId="9" LastEditorUserId="9134576" LastEditorDisplayName="user14723686" LastEditDate="2021-01-29T18:46:45.963" LastActivityDate="2021-01-29T18:46:45.963" Title="Why did the width collapse in the percentage width child element in an absolutely positioned parent on Internet Explorer 7?" Tags="&lt;html&gt;&lt;css&gt;&lt;internet-explorer-7&gt;" AnswerCount="7" CommentCount="0" FavoriteCount="13" ContentLicense="CC BY-SA 4.0" />
</posts>

Can I load every row one by one without loading whole XML file into memory?我可以row加载而不将整个 XML 文件加载到 memory 中吗? For example printing all of the titles例如打印所有的标题

Providing the XML file is structured exactly as shown in the example then BeautifulSoup could be used to parse relevant lines.如果 XML 文件的结构与示例中所示的完全相同,则 BeautifulSoup 可用于解析相关行。 Something like this:像这样:

from bs4 import BeautifulSoup as BS
with open('my.xml') as xml:
    for line in map(str.strip, xml):
        if line.startswith('<row'):
            soup = BS(line, 'lxml')
            if row := soup.find('row'):
                if title := row.get('title'):
                    print(title)

"Lines" in XML are pretty irrelevant; XML 中的“行”是无关紧要的; the relevant units are things like elements, attributes, start tags, end tags.相关单位是元素、属性、开始标签、结束标签等。

A streaming parser (often called a SAX parser, though strictly speaking SAX is a Java API) will deliver the document to the application incrementally, not one line at a time, but one syntactic unit at a time.流式解析器(通常称为 SAX 解析器,尽管严格来说 SAX 是一个 Java API)将递增地向应用程序交付文档,不是一次一行,而是一次一个语法单元。

See for example Python SAX Parser参见例如Python SAX 解析器

You can try something like this:你可以尝试这样的事情:

while line:= file.readline():

Yes, you can use open() , it will return a file object and not read the file content into the RAM.是的,你可以使用open() ,它会返回一个文件 object 而不是将文件内容读入 RAM。 So you want to do something like this:所以你想做这样的事情:

with open('file_name') as file:
    for row in file:
        print(row)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 memory 中读取第 n 行 importlib.resources.files 而不加载整个文件 - Read nth line of importlib.resources.files without loading whole file in memory 如何在 python 中打开一个 csv 文件,一次读取一行,而不将整个 csv 文件加载到内存中? - How can I open a csv file in python, and read one line at a time, without loading the whole csv file in memory? 通过 FTP 逐行读取 CSV 而不将整个文件存储在内存/磁盘中 - Read CSV over FTP line by line without storing the whole file in memory/disk 在python中加载一个txt文件的第n行而不加载整个文件 - Loading the nth line of a txt file in python without loading the whole file 如何在不读取整行或文件的情况下读取令牌 - How to read tokens without reading whole line or file Python-读取和删除文件的顶行而不将其加载到内存中 - Python - reading and deleting the top line of a file without loading it into memory 如何逐行读取大型文本文件,而不将其加载到内存中? - How can I read large text files line by line, without loading it into memory? 在不填充内存的情况下读取 Python 中的特定文件行 - Read specific line of file in Python without filling memory 逐行读取文件还是存储在内存中? - Read file line-by-line or store in memory? 使用 h5py 从大文件中读取而不将整个文件加载到内存中 - Read from a large file without loading whole thing into memory using h5py
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM