简体   繁体   中英

C# LINQ xml parsing using “PreviousNode”

With quite some help from SO, I managed to put together the following LINQ expression.

var parentids = xliff.Descendants()
                     .Elements(xmlns + "trans-unit")
                     .Elements(xmlns + "seg-source")
                     .Elements(xmlns + "mrk")
                     .Where(e => e.Attribute("mtype").Value == "seg")
                     .Select(item => (XElement)item.Parent.Parent.PreviousNode)
                         .Where(item => item != null)
                         .Select(item => item.Elements(xmlns + "source")
                             .Where(itema => itema != null)
                             .Select(itemb => itemb.Elements(xmlns + "x")             
                             .LastOrDefault()
                             .Attribute("id")
                             .Value.ToString())).ToArray();

What it does is that it locates a mrk tag (that has @mtype="seg" ) and then it goes up to the trans-unit ancestor (.parent.parent) and checks if the previous sibling trans-unit has a child trans and if not, it returns from the source child the @id of the last x element, otherwise it returns null (it must return null, cannot just not return match).

I need to add that while the below samples only have one such previous node with no trans element, in the real life xml there are many more, so I must use PreviousNode .

Here is the XML it works with, and returns "2" perfectly:

<?xml version="1.0" encoding="utf-8"?>
<xliff xmlns:sdl="http://sdl.com/FileTypes/SdlXliff/1.0" version="1.2" sdl:version="1.0" xmlns="urn:oasis:names:tc:xliff:document:1.2">
  <file original="Pasadena_Internet_2016.xml" source-language="en-US" datatype="x-sdlfilterframework2" target-language="da-DK">
    <body>
      <trans-unit id="d679cb2d-ecba-47ba-acb7-1bb4a798c755" translate="no">
        <source>
          <x id="0" />
          <x id="1" />
          <x id="2" />
        </source>
      </trans-unit>
      <trans-unit id="aed9fde2-fd1b-4eba-bfc9-06d325aa7047">
        <source>
          <x id="3" />Pasadena, California’s iconic Colorado Boulevard <x id="4" />has been the site of the world-famous Tournament of Roses Parade since it began in 1890.
        </source>
        <seg-source>
          <mrk mtype="seg" mid="1">
            <x id="3" />Pasadena, California’s iconic Colorado Boulevard <x id="4" />has been the site of the world-famous Tournament of Roses Parade since it began in 1890.
          </mrk>
        </seg-source>
        <target>
          <mrk mtype="seg" mid="1">
            <x id="3" /><x id="4" />Pasadena, Californiens ikoniske Colorado Boulevard har været stedet for den verdensberømte Rose Bowl-parade siden den begyndte i 1890.
          </mrk>
        </target>
      </trans-unit>
    </body>
  </file>
</xliff>

The problem is that I need to solve as a last step is that there is another type of XML that has the staring trans-unit encapsulated within another group element that is not present in the other XML. So here there is one more parent to jump upwards and get the previous trans-unit sibling, right before the group .

I am trying to build this into the same LINQ expression so it handles both scenarios.

In fact if I modify the line 6 to this, then it works:

.Select(item => (XElement)item.Parent.Parent.Parent.PreviousNode)
<!--                                        ^------ additional Parent --> 

Here is the other XML that right now throws an exception with the above code, but it should return "0" :

<?xml version="1.0" encoding="utf-8"?>
<xliff xmlns:sdl="http://sdl.com/FileTypes/SdlXliff/1.0" xmlns="urn:oasis:names:tc:xliff:document:1.2" version="1.2" sdl:version="1.0">
  <file original="Internet_Anti-DrugIntro2015.xml_1457007.xlf" datatype="x-sdlfilterframework2" source-language="en-US" target-language="hu-HU">
    <body>
      <trans-unit translate="no" id="c3a13bfb-ed51-49cf-8278-e2c86c2114c0">
        <source>
          <x id="0"/>
        </source>
      </trans-unit>
      <group>
        <sdl:cxts>
          <sdl:cxt id="1"/>
        </sdl:cxts>
        <trans-unit id="3b4520df-4483-4c9e-8a9b-ce2544269f3e">
          <source>
            <x id="1"/>
          </source>
          <seg-source>
            <mrk mtype="seg" mid="2">
              <x id="1"/>Drugs are robbing our children of their future.
            </mrk>
            <mrk mtype="seg" mid="3">
              <x id="2"/>Every 17 seconds a teenager experiments with an illicit drug for the first time.
            </mrk>
          </seg-source>
          <target>
            <mrk mtype="seg" mid="2">
              <x id="1"/>A drogok megfosztják gyermekeinket a jövőjüktől.
            </mrk>
            <mrk mtype="seg" mid="3">
              <x id="2"/>17 másodpercenként egy újabb tizenéves próbálja ki először a kábítószereket.
            </mrk>
          </target>
        </trans-unit>
      </group>
      <trans-unit translate="no" id="7890462c-edcb-4fe6-9192-033ba76d9942">
        <source>
          <x id="183"/>
        </source>
      </trans-unit>
    </body>
  </file>
</xliff>

I will be more than appreciative for any help.

Instead of navigating up the XML tree using Parent several times depending on the XML structure, you can try using Ancestors().Last() to find the highest level ancestor named either "trans-unit" or "group" , and then navigate to the previous node.

Try to replace this part :

.Select(item => (XElement) item.Parent.Parent.PreviousNode)

with this one :

.Select(item => (XElement)item.Ancestors()
                              .Last(o => new[]{"trans-unit","group"}.Contains(o.Name.LocalName))
                              .PreviousNode)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM