XmlReader从固定长度的缓冲区读取

Question

传入流进入一个固定的1024字节缓冲区，该流本身是一个拥抱XML文件，可能需要经过几轮读取才能完成。 我的目标是读取缓冲区并找出元素在大型XML文件中出现了多少次。

我的挑战是，因为它实际上是一个固定长度的缓冲区，所以它不能保证XML的格式，如果将流包装在XmlTextReader中，则总是会出现异常，并且无法完成读取。 例如，元素可以是abcdef，而第一个缓冲区可以以abc结束，而第二个缓冲区以def开头。 我对此感到非常沮丧，任何人都可以建议一种更好的方式来使用流媒体方式来归档此文件？ （我不想将全部内容加载到内存中）

非常感谢

Answer 1

您的1024字节缓冲区是否来自System.IO.Stream的标准，具体实现之一？ 如果是，则只需围绕基本流创建XmlTextReader：

XmlTextReader tr = XmlTextReader.Create( myStreamInstance ) ;

如果不是这样（例如，您正在从某种API中“读取”缓冲区），则需要实现自己的具体Stream，遵循这些原则（您需要做的就是充实ReadNextFrame（）方法并可能实现您的构造函数）：

public class MyStream : System.IO.Stream
{
    public override bool CanRead  { get { return true  ; } }
    public override bool CanSeek  { get { return false ; } }
    public override bool CanWrite { get { return false ; } }
    public override long Length   { get { throw new NotImplementedException(); } }
    public override long Position {
                                    get { throw new NotImplementedException(); }
                                    set { throw new NotImplementedException(); }
                                  }

    public override int Read( byte[] buffer , int offset , int count )
    {
        int bytesRead = 0 ;

        if ( !initialized )
        {
            Initialize() ;
        }

        for ( int bytesRemaining = count ; !atEOF && bytesRemaining > 0 ; )
        {

            int frameRemaining = frameLength - frameOffset ;
            int chunkSize      = ( bytesRemaining > frameRemaining ? frameRemaining : bytesRemaining ) ;

            Array.Copy( frame , offset , frame , frameOffset , chunkSize ) ;

            bytesRemaining -= chunkSize ;
            offset         += chunkSize ;
            bytesRead      += chunkSize ;

            // read next frame if necessary
            if ( frameOffset >= frameLength )
            {
                ReadNextFrame() ;
            }

        }

        return bytesRead ;
    }

    public override long Seek( long offset , System.IO.SeekOrigin origin ) { throw new NotImplementedException(); }
    public override void SetLength( long value )                           { throw new NotImplementedException(); }
    public override void Write( byte[] buffer , int offset , int count )   { throw new NotImplementedException(); }
    public override void Flush()                                           { throw new NotImplementedException(); }

    private byte[] frame       = null  ;
    private int    frameLength = 0     ;
    private int    frameOffset = 0     ;
    private bool   atEOF       = false ;
    private bool   initialized = false ;

    private void Initialize()
    {
        if ( initialized ) throw new InvalidOperationException() ;

        frame       = new byte[1024] ;
        frameLength = 0 ;
        frameOffset = 0 ;
        atEOF       = false ;
        initialized = true ;

        ReadNextFrame() ;

        return ;
    }

    private void ReadNextFrame()
    {

        //TODO: read the next (or first 1024-byte buffer
        //TODO: set the frame length to the number of bytes actually returned (might be less than 1024 on the last read, right?
        //TODO: set the frame offset to 0
        //TODO: set the atEOF flag if we've exhausted the data source ;

        return ;

    }

}

然后如上所述实例化XmlReader：

System.IO.Stream     s  = new MyStream() ;
System.Xml.XmlReader xr = XmlTextReader.Create( s ) ;

干杯!

Answer 2

这有点奇怪的目标……通常更像是“计数元素，但不将整个XML加载到内存中”，这很简单-写Stream派生类，将您的缓冲区表示为仅转发流（类似于NetworkStream），然后读取XML（即使用LINQ）通常使用XmlReader，但不要构造XmlDocument。

如果您明确目标，其他人可能会更容易提出建议。

XmlReader从固定长度的缓冲区读取

问题描述

2 个解决方案

解决方案1
2 2011-01-21 01:09:03

解决方案2
0 2011-01-21 00:38:43

XmlReader从固定长度的缓冲区读取

问题描述

2 个解决方案

解决方案1 2 2011-01-21 01:09:03

解决方案2 0 2011-01-21 00:38:43

解决方案1
2 2011-01-21 01:09:03

解决方案2
0 2011-01-21 00:38:43