简体   繁体   中英

Is there a way to read raw content from XmlReader?

I have a very large XML file so I am using XmlReader in C#. Problem is some of the content contains XML-like markers that should not be processed by XmlReader.

<Narf name="DOH">Mark's test of <newline> like stuff</Narf>

This is legacy data, so it cannot be refactored... (of course)

I have tried ReadInnerXml but get the whole node. I have tried ReadElementContentAsString but get an exception saying 'newline' is not closed.

// Does not deal with markup in the content (Both lines)
ms.mText = reader.ReadElementContentAsString(); 
XElement el = XNode.ReadFrom(reader) as XElement; ms.mText = el.ToString();

What I want is ms.mText to equal "Mark's test of <newline> like stuff" and not an exception.

System.Xml.XmlException was unhandled
  HResult=-2146232000
  LineNumber=56
  LinePosition=63
  Message=The 'newline' start tag on line 56 position 42 does not match the end tag of 'Narf'. Line 56, position 63.
  Source=System.Xml

The duplicate flagged question did not solve the problem because it requires changing the input to remove the problem before using the data. As stated above, this is legacy data.

I figured it out based on responses here! Not elegant, but works...

   public class TextWedge : TextReader
   {
      private StreamReader mSr = null;
      private string mBuffer = "";

      public TextWedge(string filename)
      {
         mSr = File.OpenText(filename);
         // buffer 50
         for (int i =0; i<50; i++)
         {
            mBuffer += (char) (mSr.Read());
         }
      }
      public override int Peek() 
      {
         return mSr.Peek() + mBuffer.Length;
      }

      public override int Read()
      {
         int iRet = -1;
         if (mBuffer.Length > 0)
         {
            iRet = mBuffer[0];
            int ic = mSr.Read();
            char c = (char)ic;
            mBuffer = mBuffer.Remove(0, 1);
            if (ic != -1)
            {
               mBuffer += c;
               // Run through the battery of non-xml tags
               mBuffer = mBuffer.Replace("<newline>", "[br]");
            }
         }
         return iRet;
      }
   }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM