简体   繁体   中英

Class using XmlReader is not grabbing XML data as expected

I have a class in my app that is not transforming my XML data as expected.

XML Data

Below is an excerpt of the XML. The file size can be between 2 GB and 3 GB, and the data is a representation of Mutual Funds. Each Fund usually has managers associated with it, but it's possible that there are none listed. A Fund in the data can have multiple ManagerDetail nodes or can have no ManagerDetail nodes. Each manager can have multiple CollegeEducation nodes or no CollegeEducation nodes.

<MutualFund>
<ManagerList>
<ManagerDetail>
    <ManagerRole>M</ManagerRole>
    <ManagerId>7394</ManagerId>
    <ManagerTenure>3.67</ManagerTenure>
    <StartDate>2011-09-30</StartDate>
    <OwnershipLevel>6</OwnershipLevel>
    <GivenName>Stephen</GivenName>
    <MiddleName>M.</MiddleName>
    <FamilyName>Kane</FamilyName>
    <Gender>M</Gender>
    <Certifications>
        <CertificationName>CFA</CertificationName>
    </Certifications>
    <CollegeEducations>
        <CollegeEducation>
            <School>University of Chicago</School>
            <Year>1990</Year>
            <Degree>M.B.A.</Degree>
        </CollegeEducation>
        <CollegeEducation>
            <School>University of California - Berkeley</School>
            <Year>1985</Year>
            <Degree>B.S.</Degree>
            <Major>Business</Major>
        </CollegeEducation>
    </CollegeEducations>
</ManagerDetail>
</ManagerList>
</MutualFund>

C# Class

I've created a class that is called within a BackgroundWorker instance in another form. This class places the above data into the following table:

public static DataTable dtManagersEducation = new DataTable();
dtManagersEducation.Columns.Add("ManagerId");
dtManagersEducation.Columns.Add("Institution");
dtManagersEducation.Columns.Add("DegreeType");
dtManagersEducation.Columns.Add("Emphasis");
dtManagersEducation.Columns.Add("Year");

The method that places the XML data is set up like this. Basically, I have certain points where DataRows are created and completed, and certain XML data is to be placed into the available row as the data is read.

public static void Read(MainForm mf, XmlReader xml)
{
    mainForm = mf;
    xmlReader = xml;

    while (xmlReader.Read() && mainForm.continueProcess)
    {
        if (xmlReader.Name == "CollegeEducation")
        {
            if (nodeIsElement())
            {
                drManagersEducation = dtManagersEducation.NewRow();
                drManagersEducation["ManagerId"] = currentManager.morningstarManagerId;
            }
            else if (nodeIsEndElement())
            {
                dtManagersEducation.Rows.Add(drManagersEducation);
                drManagersEducation = null;
            }
        }
        else if (xmlReader.Name == "School")
        {
            if (nodeIsElement() && drManagersEducation != null)
            {
                string value = xmlReader.ReadElementContentAsString();
                drManagersEducation["Institution"] = value;
            }
        }
        else if (xmlReader.Name == "Year")
        {
            if (nodeIsElement() && drManagersEducation != null)
            {
                string value = xmlReader.ReadElementContentAsString();
                drManagersEducation["Year"] = value;
            }
        }
        else if (xmlReader.Name == "Degree")
        {
            if (nodeIsElement() && drManagersEducation != null)
            {
                string value = xmlReader.ReadElementContentAsString();
                drManagersEducation["DegreeType"] = value;
            }
        }
        else if (xmlReader.Name == "Major")
        {
            if (nodeIsElement() && drManagersEducation != null)
            {
                string value = xmlReader.ReadElementContentAsString();
                drManagersEducation["Emphasis"] = value;
            }
        }
    }
}

private static bool nodeIsElement()
{
    return xmlReader.NodeType == XmlNodeType.Element;
}

private static bool nodeIsEndElement()
{
    return xmlReader.NodeType == XmlNodeType.EndElement;
}

The result ends up with no data in the Emphasis or Year columns, which as you can see above, there are instances (plenty) that have data in these fields.

ManagerId    Institution        DegreeType   Emphasis    Year

5807         Yale University    M.S.    
9336         Yale University        
7227         Yale University    M.S.        

Would you all happen to have some insight into what is going on?

Thanks

Edit: Answer

My sample XML data listed above has indented spaces, but the actual data that I was running through the XmlReader did not. As dbc has shown below, adding a variable bool readNext fixed my issues. As I understand it, if readNext is set to false when ReadElementContentAsString() is called, the XmlReader will not call Read() since my while loop condition now contains (!readNext || xmlReader.Read()) . This prevents the two methods ReadElementContentAsString() and Read() to be called right after another, and thus, it will not skip over data.

Thanks to dbc!

The problem you are seeing is that the method XmlReader.ReadElementContentAsString moves the reader past the end element tag. If you then do xmlReader.Read() unconditionally right afterwards, the node immediately after the end element tag will be skipped . In the XML shown in your question, the node immediately after your end element tags is whitespace , so the bug isn't reproducible with your question. But if I strip the indentation (and hopefully your 2+GB XML file has no indentation), the bug becomes reproducible.

Also, in your question, I don't see where you actually read the <ManagerId>7394</ManagerId> tag. Instead you just take it from currentManager.morningstarManagerId (an undefined global variable). I reckon that's a typo in your question, and in your actual code you read this somewhere.

Here's a version of your method that fixes these problems and can be compiled and tested standalone:

    public static DataTable Read(XmlReader xmlReader, Func<bool> continueProcess)
    {
        DataTable dtManagersEducation = new DataTable();
        dtManagersEducation.TableName = "ManagersEducation";

        dtManagersEducation.Columns.Add("ManagerId");
        dtManagersEducation.Columns.Add("Institution");
        dtManagersEducation.Columns.Add("DegreeType");
        dtManagersEducation.Columns.Add("Emphasis");
        dtManagersEducation.Columns.Add("Year");

        bool inManagerDetail = false;
        string managerId = null;
        DataRow drManagersEducation = null;

        bool readNext = true;
        while ((!readNext || xmlReader.Read()) && continueProcess())
        {
            readNext = true;
            if (xmlReader.NodeType == XmlNodeType.Element)
            {
                if (!xmlReader.IsEmptyElement)
                {
                    if (xmlReader.Name == "ManagerDetail")
                    {
                        inManagerDetail = true;
                    }
                    else if (xmlReader.Name == "ManagerId")
                    {
                        var value = xmlReader.ReadElementContentAsString();
                        readNext = false;
                        if (inManagerDetail)
                            managerId = value;
                    }
                    else if (xmlReader.Name == "School")
                    {
                        var value = xmlReader.ReadElementContentAsString();
                        readNext = false;
                        if (drManagersEducation != null)
                            drManagersEducation["Institution"] = value;
                    }
                    else if (xmlReader.Name == "Year")
                    {
                        var value = xmlReader.ReadElementContentAsString();
                        readNext = false;
                        if (drManagersEducation != null)
                            drManagersEducation["Year"] = value;
                    }
                    else if (xmlReader.Name == "Degree")
                    {
                        var value = xmlReader.ReadElementContentAsString();
                        readNext = false;
                        if (drManagersEducation != null)
                            drManagersEducation["DegreeType"] = value;
                    }
                    else if (xmlReader.Name == "Major")
                    {
                        var value = xmlReader.ReadElementContentAsString();
                        readNext = false;
                        if (drManagersEducation != null)
                            drManagersEducation["Emphasis"] = value;
                    }
                    else if (xmlReader.Name == "CollegeEducation")
                    {
                        if (managerId != null)
                        {
                            drManagersEducation = dtManagersEducation.NewRow();
                            drManagersEducation["ManagerId"] = managerId;
                        }
                    }
                }
            }
            else if (xmlReader.NodeType == XmlNodeType.EndElement)
            {
                if (xmlReader.Name == "ManagerDetail")
                {
                    inManagerDetail = false;
                    managerId = null;
                }
                else if (xmlReader.Name == "CollegeEducation")
                {
                    if (drManagersEducation != null)
                        dtManagersEducation.Rows.Add(drManagersEducation);
                    drManagersEducation = null;
                }
            }
        }

        return dtManagersEducation;
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM