简体   繁体   中英

Querying XML tree with Linq

I'm trying to parse out an complex XML file using LINQ. The files contains thousands of records, each with hundreds of fields. I need to parse out certain parts of information about each drug and store it in a database.

Edit: I'm very sorry all, but the originally posted XML was in fact not accurate. I was unaware of the fact that the attributes would alter the process. I've updated the question to accurately portray the true nature of XML file.

Here's a sample of the XML:

<<drugs xmlns:xs="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://drugbank.ca" xs:schemaLocation="http://www.drugbank.ca/docs/drugbank.xsd" schemaVersion="1.4">
   <drug>
      <name>foo</name>
      <indication>Some info here</indication>
      <half-life>1 to 3 hours</half-life>
      <protein-binding>90%</protein-binding>
        // hundreds of other elements
      <properties>
         <property>
            <kind>logP/hydrophobicity</kind>
            <value>-0.777</value>
         </property>
         <property>
            <kind>Molecular Weight</kind>
            <value>6963.4250</value>
         </property>
         <property>
            <kind>Molecular Formula</kind>
            <value>C287H440N80O110S6</value>
         </property>
         //dozens of other properties
      </properties>
   </drug>
   // thousands of more drugs
</drugs>

I'm pretty fuzzy on the actual querying, as this is my first time working with LINQ. I'm familiar with SQL, so the concept of complex queries aren't difficult for me, but I haven't been able to find any documentation that I can understand that helps with this issue. The query that I have so far is as follows:

XDocument xdoc = XDocument.Load(@"drugbank.xml");

var d = from drugs in xdoc.Descendants("drug")
                        select new
                        {
                            name = drugs.Element("name").Value,
                            indication = drugs.Element("indication").Value,
                            halflife = drugs.Element("half-life").Value,
                            proteinBinding = drugs.Element("protein-binding").Value,
                        };

The first issue is (theoretically) resolved. On to...

The second issue is the fact that I need to extract some of the properties (namely, hydrophobicity, molecular weight, and molecular formula), but where I'm confused is that the property kind and property value are stored in two different XElements. How can I get the property values restricted to the fields that I care about?

I've pasted your code: output:

foo
Some info here
1 to 3 hours
90%

just as expected

You can do a subquery to get the properties into another property of the outer generic object. If you want them nested:

XNamespace defaultNS = "http://drugbank.ca";

var d = from drugs in xdoc.Descendants(defaultNS + "drug")
        select new
        {
            name = drugs.Element(defaultNS + "name").Value,
            indication = drugs.Element(defaultNS + "indication").Value,
            halflife = drugs.Element(defaultNS + "half-life").Value,
            proteinBinding = drugs.Element(defaultNS + "protein-binding").Value,
            Properties = (from property in drugs.Element(defaultNS + "properties").Elements(defaultNS + "property")
                          let kind = property.Element(defaultNS + "kind").Value
                          where kind == "logP/hydrophobicity" || kind == "Molecular Weight" || kind == "Molecular Formula"
                          select new { Kind = kind, Value = property.Element(defaultNS + "value").Value })
        };

Or flattened:

XNamespace defaultNS = "http://drugbank.ca";

var d = from drugs in xdoc.Descendants(defaultNS + "drug")
        let properties = drugs.Element(defaultNS + "properties").Elements(defaultNS + "property")
        select new
        {
            name = drugs.Element(defaultNS + "name").Value,
            indication = drugs.Element(defaultNS + "indication").Value,
            halflife = drugs.Element(defaultNS + "half-life").Value,
            proteinBinding = drugs.Element(defaultNS + "protein-binding").Value,
            hydrophobicity = (from property in properties
                          let kind = property.Element(defaultNS + "kind").Value
                          where kind == "logP/hydrophobicity"
                          select property.Element(defaultNS + "value").Value).FirstOrdefaultNS(),
            molecularWeight = (from property in properties
                          let kind = property.Element(defaultNS + "kind").Value
                          where kind == "Molecular Weight" || kind == "Molecular Formula"
                          select property.Element(defaultNS + "value").Value).FirstOrdefaultNS(),
            molecularFormula = (from property in properties
                          let kind = property.Element(defaultNS + "kind").Value
                          where kind == "Molecular Formula"
                          select property.Element(defaultNS + "value").Value).FirstOrdefaultNS()
        };

Also, a very useful reference that can help you learn about Linq is 101 LINQ Samples .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM