简体   繁体   中英

Transforming flat XML to a hierarchy (XML or Object) with C# and LINQ

I see there are a few articles on the same topic but the combination of a few factors makes this problem a little more challenging.

I need to transform flat XML that looks like this:

string myXML = "<a>" +
    "<b><levelNumber>01</levelNumber><name>top1</name></b>" +
    "<b><levelNumber>05</levelNumber><name>lev2a</name></b>" +
    "<b><levelNumber>05</levelNumber><name>lev2b</name></b>" +
    "<b><levelNumber>10</levelNumber><name>lev3a</name></b>" +
    "<b><levelNumber>10</levelNumber><name>lev3b</name></b>" +
    "<b><levelNumber>10</levelNumber><name>lev3c</name></b>" +
    "<b><levelNumber>05</levelNumber><name>lev2c</name></b>" +
    "<b><levelNumber>10</levelNumber><name>lev3d</name></b>" +
    "<b><levelNumber>01</levelNumber><name>top2</name></b>" +
    "<b><levelNumber>05</levelNumber><name>lev2d</name></b>" +
    "<b><levelNumber>05</levelNumber><name>lev2e</name></b>" +
    "<b><levelNumber>05</levelNumber><name>lev2f</name></b>" +
    "</a>";

... either to a nested object structure or even just to a hierarchical XML structure, preferably using LINQ. The value that determine the structure of the hierarchy is the levelNumber element. Simply those elements with the higher numbers should be children of the preceding lower number.

The desired XML should look like this:

<?xml version="1.0" encoding="UTF-8"?>
<a>
    <b>
        <levelNumber>01</levelNumber>
        <name>top1</name>
    </b>
    <children>
        <b>
            <levelNumber>05</levelNumber>
            <name>lev2a</name>
        </b>
        <b>
            <levelNumber>05</levelNumber>
            <name>lev2b</name>
        </b>
        <children>
            <b>
                <levelNumber>10</levelNumber>
                <name>lev3a</name>
            </b>
            <b>
                <levelNumber>10</levelNumber>
                <name>lev3b</name>
            </b>
            <b>
                <levelNumber>10</levelNumber>
                <name>lev3c</name>
            </b>
        </children>
        <b>
            <levelNumber>05</levelNumber>
            <name>lev2c</name>
        </b>
        <children>
            <b>
                <levelNumber>10</levelNumber>
                <name>lev3d</name>
            </b>
        </children>
    </children>
    <b>
        <levelNumber>01</levelNumber>
        <name>top2</name>
    </b>
    <children>
        <b>
            <levelNumber>05</levelNumber>
            <name>lev2d</name>
        </b>
        <b>
            <levelNumber>05</levelNumber>
            <name>lev2e</name>
        </b>
        <b>
            <levelNumber>05</levelNumber>
            <name>lev2f</name>
        </b>
    </children>
</a>

While there are a few related solutions out there, the part that keeps tripping me up is that the same type of element needs to be nested.

I started with the approach of reading through the flat structure by peeking ahead: (Using a function to extract the numeric value of level)

int firstSibling = GetLevel(element.ElementsAfterSelf("b").First());
var childElements = element.ElementsAfterSelf("b")
             .TakeWhile(x => GetLevel(x) < firstSibling);  // all elements with higher level number

I would then recurse through childElements (of say element lev2b that has child elements, but coming out of that there are more level 5's and I am not sure how to capture those with either a loop or LINQ or both.

Also, if you desire to handle unlimited depth then you will have to recurse in a function that handles the child elements and go through the same process of searching for child elements in each of those children.

Like I said in the beginning, this whole structure will eventually become a nested object structure, but if that is too difficult I am willing to work from a nested XML structure.

Any thoughts on this will be welcomed.

Thanks. Corneel.

We were able to come up with a solution that makes use of LINQ and delivers the desired hierarchical XML.

void Main()
{
    var elements = GetElements(GetMyXml());
    //elements.Dump();
    var workingStorageVariables = GetWorkingStorageVariables(elements);
    //workingStorageVariables.Dump();

    var tree = new XElement("a",
        workingStorageVariables
            .Where(h => h.Level == 1)
            .Select
            (
                h => new XElement("children", 
                        new XElement("b",
                            new XElement("levelNumber", h.LevelText),
                            new XElement("name", h.VariableName),
                            GetChildVariables(workingStorageVariables, h))
                )
            )
    );

    Console.WriteLine(tree);
}

public static string GetMyXml()
{
    return "<a>" +
    "<b><levelNumber>01</levelNumber><name>top1</name></b>" +
    "<b><levelNumber>05</levelNumber><name>lev2a</name></b>" +
    "<b><levelNumber>05</levelNumber><name>lev2b</name></b>" +
    "<b><levelNumber>10</levelNumber><name>lev3a</name></b>" +
    "<b><levelNumber>10</levelNumber><name>lev3b</name></b>" +
    "<b><levelNumber>10</levelNumber><name>lev3c</name></b>" +
    "<b><levelNumber>05</levelNumber><name>lev2c</name></b>" +
    "<b><levelNumber>10</levelNumber><name>lev3d</name></b>" +
    "<b><levelNumber>01</levelNumber><name>top2</name></b>" +
    "<b><levelNumber>05</levelNumber><name>lev2d</name></b>" +
    "<b><levelNumber>05</levelNumber><name>lev2e</name></b>" +
    "<b><levelNumber>05</levelNumber><name>lev2f</name></b>" +
    "</a>";
}

public static IEnumerable<MyElement> GetElements(string xml)
{
    XDocument doc = XDocument.Parse(xml);

    return doc.Root
              .Elements()
              .Elements()
              .Select(x => new MyElement
              {
                  ElementName = x.Name.LocalName,
                  ElementValue = x.Value
              });
}

public static IEnumerable<WorkingStorageVariable> GetWorkingStorageVariables(IEnumerable<MyElement> elements)
{
    List<WorkingStorageVariable> workingStorageVariables = new List<WorkingStorageVariable>();

    int level = 0;
    string levelText = String.Empty;
    string variableName = String.Empty;

    foreach (var element in elements)
    {
        if (element.ElementName == "levelNumber")
        {
            levelText = element.ElementValue;
            level = Convert.ToInt32(element.ElementValue);
        }

        if (level != 0 && element.ElementName == "name")
        {
            variableName = element.ElementValue;
            workingStorageVariables.Add(new WorkingStorageVariable { Level = level, LevelText = levelText, VariableName = variableName });
            level = 0;
        }
    }

    return workingStorageVariables;
}

public static IEnumerable<XElement> GetChildVariables(
            IEnumerable<WorkingStorageVariable> workingStorageVariables,
            WorkingStorageVariable parent)
{
    return
        workingStorageVariables
            .SkipWhile(h => h != parent)
            .Skip(1)
            .TakeWhile(h => h.Level > parent.Level)
            .Select(h =>
                new XElement("children",
                    new XElement("b",
                        new XElement("levelNumber", h.LevelText),
                        new XElement("Name", h.VariableName),
                        GetChildVariables(workingStorageVariables, h)
                    ))
            );
}

public class MyElement
{
    public string ElementName { get; set; }
    public string ElementValue { get; set; }
}

public class WorkingStorageVariable
{
    public int Level { get; set; }
    public string LevelText { get; set; }
    public string VariableName { get; set; }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM