简体   繁体   English

c#如何使用普通文本元素和xml文本元素读取单个文件

[英]c# How to read a single file with normal and xml text elements

I am receiving a stream of data from a webservice and trying to save the contents of the stream to file. 我从Web服务接收数据流并尝试将流的内容保存到文件。 The stream contains standard lines of text alongside large chunks of xml data (on a single line). 该流包含标准的文本行以及大块的xml数据(在一行上)。 The size of the file is about 800Mb. 文件大小约为800Mb。

Problem: Receiving an out of memory exception when I process the xml section of each line. 问题:处理每行的xml部分时收到内存不足异常。

==start file
line 1
line 2
<?xml version=.....huge line etc</xml>
line 3
line4
<?xml version=.....huge line etc</xml>
==end file

Current code, as you can see when it reads in the huge xml line then it spikes the memory. 当前代码,正如您在巨大的xml行中读取时所看到的那样,它会激活内存。

string readLine;
using (StreamReader reader = new StreamReader(downloadStream))
{
    while ((readLine = reader.ReadLine()) != null)
    {
        streamWriter.WriteLien(readLine); //writes to file
    }
}

I was trying to think of a solution where I used both a TextReader/StreamReader and XmlTextReader in combination to process each section. 我试图想出一个解决方案,我将TextReader / StreamReader和XmlTextReader结合使用来处理每个部分。 As I get to the xml section I could switch to the XmlTextReader and use the Read() method to read each node thus stopping the memory spike. 当我到达xml部分时,我可以切换到XmlTextReader并使用Read()方法读取每个节点,从而停止内存峰值。

Any suggestions on how I could do this? 有关如何做到这一点的任何建议? Alternatively, I could create a custom XmlTextReader that was able to read in these lines? 或者,我可以创建一个能够读取这些行的自定义XmlTextReader吗? Any pointers for this? 有什么指针吗?

Updated 更新

A further problem to this is that I need to read this file back in and split out the two xml sections to separate xml files! 另一个问题是,我需要重新读取此文件并拆分两个xml部分以分隔xml文件! I converted the solution to write the file using a binary writer and then started to read the file back in using a binary reader. 我转换了解决方案,使用二进制编写器编写文件,然后开始使用二进制读取器读回文件。 I have text processing to detect the start of the xml section and specifically which xml section so I can map it to the correct file! 我有文本处理来检测xml部分的开始,具体是哪个xml部分,所以我可以将它映射到正确的文件! However this causes problems reading in the binary file and doing detection... 但是,这会导致读取二进制文件并进行检测时出现问题...

using (BinaryReader reader = new BinaryReader(savedFileStream))
{
    while ((streamLine = reader.ReadString()) != null)
    {
        if (streamLine.StartsWith("<?xml version=\"1.0\" ?><tag1"))
        //xml file 1
        else if (streamLine.StartsWith("<?xml version=\"1.0\" ?><tag2"))
        //xml file 2

XML may contain all content as one single line, so you'd probably better use a binary reader/writer where you can decide about the read/write size. XML可能包含所有内容作为一行,因此您可能最好使用二进制读取器/写入器,您可以在其中决定读/写大小。

An example below, here we read BUFFER_SIZE bytes for each iteration: 下面是一个例子,这里我们为每次迭代读取BUFFER_SIZE个字节:

        Stream s = new MemoryStream();
        Stream outputStream = new MemoryStream();
        int BUFFER_SIZE = 1024;
        using (BinaryReader reader = new BinaryReader(s))
        {
            BinaryWriter writer = new BinaryWriter(outputStream);
            byte[] buffer = new byte[BUFFER_SIZE];
            int read = buffer.Length;
            while(read != 0)
            {
                read = reader.Read(buffer, 0, BUFFER_SIZE);

                writer.Write(buffer, 0, read);

            }

            writer.Flush();
            writer.Close();
        }

I don't know if this causes you problems with encodings etc, but I think you will have to read the file as binary. 我不知道这是否会导致您编码等问题,但我认为您必须将文件读取为二进制文件。

If all you want to do is copy one stream to another without modifying the data, you don't need the Stream text or binary helpers (StreamReader, StreamWriter, BinaryReader, BinaryWriter, etc.), simply copy the stream. 如果您只想将一个流复制到另一个流而不修改数据,则不需要Stream文本或二进制帮助程序(StreamReader,StreamWriter,BinaryReader,BinaryWriter等),只需复制流即可。

internal static class StreamExtensions
{
    public static void CopyTo(this Stream readStream, Stream writeStream)
    {
        byte[] buffer = new byte[4096];
        int read;
        while ((read = readStream.Read(buffer, 0, buffer.Length)) > 0)
            writeStream.Write(buffer, 0, read);
    }
}

I think there is a memory leakage 我认为有内存泄漏

Are you getting out of memory exception after processing a few lines or on the first line itself? 处理几行后或第一行本身是否会出现内存异常?
And there is no streamWriter.Flush() inside the while loop. 并且while循环中没有streamWriter.Flush()。
Don't you think there should be one? 你不觉得应该有吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM