简体   繁体   English

解析 XML 文件的节点

[英]Parse the Nodes of XML files

How to parse all the XML files under a given directory as an input to the application and write its output to a text file.如何解析给定目录下的所有 XML 文件作为应用程序的输入并将其输出写入文本文件。

Note: The XML is not always the same the nodes in the XML can vary and have any number of Child-nodes.注意:XML 并不总是相同的,XML 中的节点可以不同并且具有任意数量的子节点。

Any help or guidance would be really helpful on this regard :)在这方面,任何帮助或指导都会非常有帮助:)

XML File Sample XML 文件示例

<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>
<CNT>USA</CNT>
<CODE>3456</CODE>
</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
<CD>
<TITLE>Hide your heart</TITLE>
<ARTIST>Bonnie Tyler</ARTIST>
<COUNTRY>UK</COUNTRY>
<COMPANY>CBS Records</COMPANY>
<PRICE>9.90</PRICE>
<YEAR>1988</YEAR>
</CD>
</CATALOG>

C# Code C# 代码

using System;
using System.Collections.Generic;
using System.Windows.Forms;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Data;
using System.Xml;
using System.Xml.Linq;

namespace XMLTagParser
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Please Enter the Location of the file");

            // get the location we want to get the sitemaps from 
            string dirLoc = Console.ReadLine();

            // get all the sitemaps 
            string[] sitemaps = Directory.GetFiles(dirLoc);
            StreamWriter sw = new StreamWriter(Application.StartupPath + @"\locs.txt", true);

            // loop through each file 
            foreach (string sitemap in sitemaps)
            {
                try
                {
                    // new xdoc instance 
                    XmlDocument xDoc = new XmlDocument();

                    //load up the xml from the location 
                    xDoc.Load(sitemap);

                    // cycle through each child noed 
                    foreach (XmlNode node in xDoc.DocumentElement.ChildNodes)
                    {
                        // first node is the url ... have to go to nexted loc node 
                        foreach (XmlNode locNode in node)
                        {

                                string loc = locNode.Name;

                                // write it to the console so you can see its working 
                                Console.WriteLine(loc + Environment.NewLine);

                                // write it to the file 
                                sw.Write(loc + Environment.NewLine);
                            }
                        }
                    }
                catch {
                    Console.WriteLine("Error :-(");
                }
            }
            Console.WriteLine("All Done :-)");
            Console.ReadLine();
        }
    }
}

Preferred Output:首选输出:

CATALOG/CD/TITLE
CATALOG/CD/ARTIST
CATALOG/CD/COUNTRY/CNT
CATALOG/CD/COUNTRY/CODE
CATALOG/CD/COMPANY
CATALOG/CD/PRICE
CATALOG/CD/YEAR

CATALOG/CD/TITLE
CATALOG/CD/ARTIST
CATALOG/CD/COUNTRY
CATALOG/CD/COMPANY
CATALOG/CD/PRICE
CATALOG/CD/YEAR

This is a recursive problem, and what you are looking for is called 'tree traversal'.这是一个递归问题,您正在寻找的称为“树遍历”。 What this means is that for each child node, you want to look into it's children, then into that node's children (if it has any) and so on, recording the 'path' as you go along, but only printing out the names of the 'leaf' nodes.这意味着对于每个子节点,您要查看它的子节点,然后查看该节点的子节点(如果有的话)等等,在进行时记录“路径”,但只打印出“叶”节点。

You will need a function like this to 'traverse' the tree:您将需要这样的函数来“遍历”树:

static void traverse(XmlNodeList nodes, string parentPath)
{
    foreach (XmlNode node in nodes)
    {
        string thisPath = parentPath;
        if (node.NodeType != XmlNodeType.Text)
        {
            //Prevent adding "#text" at the end of every chain
            thisPath += "/" + node.Name;
        }

        if (!node.HasChildNodes)
        {
            //Only print out this path if it is at the end of a chain
            Console.WriteLine(thisPath);
        }

        //Look into the child nodes using this function recursively
        traverse(node.ChildNodes, thisPath);
    }
}

And then here is how I would add it into your program (within your foreach sitemap loop):然后这是我将它添加到您的程序中的方法(在您的foreach sitemap循环中):

try
{
    // new xdoc instance 
    XmlDocument xDoc = new XmlDocument();

    //load up the xml from the location 
    xDoc.Load(sitemap);

    // start traversing from the children of the root node
    var rootNode = xDoc.FirstChild;
    traverse(rootNode.ChildNodes, rootNode.Name);
}
catch
{
    Console.WriteLine("Error :-(");
}

I made use of this other helpful answer: Traverse a XML using Recursive function我利用了另一个有用的答案: Traverse a XML using Recursive function

Hope this helps!希望这可以帮助! :) :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM