简体   繁体   中英

How to remove #text from xml file using c#

I am trying to read xml file, and while reading I am getting #text as value. I have removed all whitespaces, then too this #text keep coming.What is the solution?

This is my original xml file

<book genre='novel' ISBN='1-861003-78' misc='sale-item'>
  <title>The Handmaid's Tale</title>
  <price>14.95</price>
</book>

This is my new xml file after removing whitespaces

<!--sample XML fragment--><book genre='novel' ISBN='1-861003-78' misc='sale-item'><title>The Handmaid's Tale</title><price>14.95</price></book>

I am trying to validate two xml files and this is the code

 static bool structValidate( XmlNodeList xmlOldNode, XmlNodeList xmlNewNode)
    {

        if (xmlOldNode.Count != xmlNewNode.Count) return true;

        for (var i = 0; i < xmlOldNode.Count; i++)
        {
            var nodeA = xmlOldNode[i];
            var nodeB = xmlNewNode[i];
            Console.WriteLine("\n" + nodeA.Name + ":");
            Console.WriteLine("\n" + nodeB.Name + ":");
            Console.ReadLine();

                if (nodeA.Attributes == null  )
                {
                    if (nodeB.Attributes != null)
                        return true;
                    else
                        continue;
                }


            if (nodeA.Attributes.Count != nodeB.Attributes.Count
            || nodeA.Name != nodeB.Name) return true;


            for (var j = 0; j < nodeA.Attributes.Count; j++)
            {
                var attrA = nodeA.Attributes[j];
                var attrB = nodeB.Attributes[j];
                Console.WriteLine(attrA.Name);
                Console.WriteLine(attrB.Name);
                Console.ReadLine();
                if (attrA.Name != attrB.Name) return true;
            }

            if (nodeA.HasChildNodes && nodeB.HasChildNodes)
            {
                return structValidate(nodeA.ChildNodes, nodeB.ChildNodes);

            }               
            else 
            {
                return false;
            }
        }
       return false;
    }

So while printing I am getting #text

The #text nodes are the whitespace being returned by the parser of your old XML file - the indentation before the <title> and <price> node.

The Fault is in your way of loading the old XML file. It is parsing the whitespace as XML nodes.

Your XML parsing way would see these 2 XML files as same files:

<book genre='novel' ISBN='1-861003-78' misc='sale-item'>
  <title>The Handmaid's Tale</title>
  <price>14.95</price>
</book>

<book genre='novel' ISBN='1-861003-78' misc='sale-item'>
someUnformatedText<title>The Handmaid's Tale</title>
someUnformatedText<price>14.95</price>
</book>

This is the documentation for XmlNode.Name

The qualified name of the node. The name returned is dependent on the NodeType of the node:

Text -> #text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM