简体   繁体   中英

How to make this xml query shorter

I spent 3 days to read this xml file and put the details in to the database. It works the way it should be but I know the way i read this xml file is not the proper way.

If the xml file is bigger than 2mb. (which contains about 1000 records), it takes more than 1 minute to load.

Can you please show me how to make this query shorter.

this is the xml

<?xml version="1.0" encoding="UTF-8"?>
<outputTree xmlns="http://www.ibm.com/software/analytics/spss/xml/oms" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/software/analytics/spss/xml/oms http://www.ibm.com/software/analytics/spss/xml/oms/spss-output-1.8.xsd">
    <command command="Summarize" displayOutlineValues="label" displayOutlineVariables="label" displayTableValues="label" displayTableVariables="label" lang="en" text="Summarize">
        <pivotTable subType="Report" text="Batch">
            <dimension axis="row" text="Cases">
                <group label="Test Site" text="Test Site" varName="PLANT_DESC" variable="true">
                    <group hide="true" text="A">
                        <group string="A" text="A" varName="PLANT_DESC">
                            <group label="Product" text="Product" varName="PROD_DESC" variable="true">
                                <group hide="true" text="A">
                                    <group string="S" text="S" varName="PROD_DESC">
                                        <group label="Batch Number" text="Batch Number" varName="BATCH_NO" variable="true">
                                            <group hide="true" text="A">
                                                <group number="3704542" text="3704542" varName="BATCH_NO">
                                                    <category number="1" text="1">
                                                        <dimension axis="column" text="Variables">
                                                            <category label="Batch Run" text="Batch Run" varName="BATCH_RUN_ID" variable="true">
                                                                <cell number="4202" text="4202" varName="BATCH_RUN_ID"/>
                                                            </category>
                                                            <category label="Application" text="Application" varName="APP_ID" variable="true">
                                                                <cell label="Calibration" number="101" text="Calibration" varName="APP_ID"/>
                                                            </category>
                                                            <category label="Date Tested" text="Date Tested" varName="TEST_DATE" variable="true">
                                                                <cell date="2014-09-23T10:53:19" format="date" text="23-SEP-2014" varName="TEST_DATE"/>
                                                            </category>
                                                        </dimension>
                                                    </category>
                                                </group>            
                                            </group>
                                        </group>
                                    </group>                                            
                                </group>
                            </group>
                        </group>
                    </group>
                </group>
            </dimension>
        </pivotTable>
    </command>
</outputTree>

This is the c#

XElement root = XElement.Load(Page.Server.MapPath(@"oril.xml"));
XNamespace ad = "http://www.ibm.com/software/analytics/spss/xml/oms";

var cats = from cat in root.Descendants(ad + "dimension").Where
               (cat => (string)cat.Attribute("axis") == "column" && (string)cat.Attribute("text") == "Variables")

           select new
           {
               BATCH_NO = cat.Parent.Parent.Attribute("number").Value,
               RUN_NO = cat.Parent.Attribute("number").Value,

               //// 1
               BATCH_RUN_ID = cat.Descendants(ad + "category").Elements(ad + "cell")
                    .Where(a => (string)a.Attribute("varName") == "BATCH_RUN_ID")
                    .Select(c => c.Attribute("number").Value),

               //// 2
               APP_ID = cat.Descendants(ad + "category").Elements(ad + "cell")
                    .Where(a => (string)a.Attribute("varName") == "APP_ID")
                    .Select(c => c.Attribute("label").Value),

               //// 3
               TEST_DATE = cat.Descendants(ad + "category").Elements(ad + "cell")
                       .Where(a => (string)a.Attribute("varName") == "TEST_DATE")
                       .Select(c => c.Attribute("date").Value),
               ////
               //// Another 12
               ////
           };

foreach (var cat in cats)
{
    foreach (string s in cat.BATCH_RUN_ID)
    {
        xmlTitle.Text += "BATCH_NO: " + cat.BATCH_NO + " </br>";
        xmlTitle.Text += "RUN_NO: " + cat.RUN_NO + " </br>";
        xmlTitle.Text += "BATCH_RUN_ID: " + s + " </br>";
    }

    foreach (string s in cat.APP_ID)
    {
        xmlTitle.Text += "APP_ID: " + s + " </br>";
        i_APP_ID = s;
    }
    foreach (string s in cat.TEST_DATE)
    {
        xmlTitle.Text += "TEST_DATE: " + s + " </br>";
        i_TEST_DATE = s;
    }
    foreach (string s in cat.CB_USED)
    {
        xmlTitle.Text += "CB_USED: " + s + " </br>";
        i_CB_USED = s;
    }
    ////
    //// Another 12
    ////
}

You could use Objects, since this is an Object Oriented Language, to ease some of your .Descendants().Elements() pain.

public class Category
{
    public readonly XElement self;
    public readonly XNamespace ns;
    public Category(XNamespace xn, XElement cat) { self = cat; ns = xn; }
    public string Name { get { return (string)self.Attribute("varName"); } }
    public Cell Cell { get { return _Cell ?? (_Cell = new Cell(self.Elements(ns+"cell").First())); } }
    Cell _Cell;
}

public class Cell
{
    public readonly XElement self;
    public Cell(XElement cell) { self = cell; }
    public string Name { get { return (string)self.Attribute("varName"); } }
    public string Number { get { return (string)self.Attribute("number"); } }
    public string Date { get { return (string)self.Attribute("date"); } }
    public string Label { get { return (string)self.Attribute("label"); } }
}

public class Dimension
{
    public readonly XElement self;
    public readonly XNamespace ns;
    public Dimension(XNamespace xn, XElement dim) { self = dim; ns = xn; }
    public string Axis { get { return (string)self.Attribute("axis"); } }
    public string Text { get { return (string)self.Attribute("text"); } }
    public string BatchNo { get { return self.Parent.Parent.Attribute("number").Value } }
    public string RunNo { get { return self.Parent.Attribute("number").Value } }
    public Category[] Categories
    { get { return _Categories ?? (_Categories = self.Elements(ns + "category")
                             .Select(cat => new Category(ns, cat))
                             .ToArray()); }
    }
    Category[] _Categories;
}

Then to use your root and ad defined in your post. If nothing else, it is more readable, but it should be faster since once a Cell is created in a Category, it doesn't need to find it on every cell call. And likewise with each category in a dimension.

var dims = root.Descendants(ad + "dimension")
               .Select(dim => new Dimension(ad, dim))
               .Where(Dim => Dim.Axis == "column" && Dim.Text == "Variables");
var cats = dims.Select(dim => new
{
    BATCH_NO = dim.BatchNo,
    RUN_NO = dim.RunNo,

    //// 1
    BATCH_RUN_ID = dim.Categories
                      .Where(cat => cat.Name == "BATCH_RUN_ID")
                      .Select(cat => cat.Cell.Number),
    //// 2
    APP_ID = dim.Categories
                      .Where(cat => cat.Name == "APP_ID")
                      .Select(cat => cat.Cell.Label),

    //// etc
}

ps I typed this manually, it may not directly compile as is, but it would be something simple like a missing ;

First thing first,

when you need to concat alot of loop String like that you need to use StringBuilder to help it first

example:
StringBuilder sb = new StringBuilder();

foreach (var cat in cats)
{
    foreach (string s in cat.BATCH_RUN_ID)
    {
        //xmlTitle.Text += "BATCH_NO: " + cat.BATCH_NO + " </br>";
        sb.append("BATCH_NO: ");
        sb.append( cat.BATCH_NO );
        sb.append(" </br>");
        // more and more, without using String + String
    }
}

//at the end of the loop, just put it back to xml text
xmlTitle.Text = sb.toString();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM