简体   繁体   中英

How do I find out the elements before a particular element in XML in c#?

I have an XML in the following format:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE repub SYSTEM "C:\repub\Repub_V1.dtd">
<?xml-stylesheet href="C:\repub\repub.xsl" type="text/xsl"?>
<repubold>
    <head>
        <title>xxx</title>
    </head>
    <body>
        <sec>
            <title>First Title</title>
            <break name="1-1"/>
            <pps>This is an invalid text.</pps>
            <h1>
                <page num="1"/>First Heading
            </h1>
            <bl>This is another text</bl>
            <fig>
                <img src="images/img_1-1.jpg" alt=""/>
                <fc>This is a caption</fc>
            </fig>
            <p>
                <bold>This</bold> again
                <br/> is
                <br/>
                <bold> a 
                    <br/>paragraph
                </bold>
            </p>
        </sec>
        <sec>
            <title>Second Title</title>
            <break name="2-1"/>
            <h1>
                <page num="1"/>Second Heading
            </h1>
            <bl>This is another text</bl>
            <fig>
                <img src="images/img_2-1.jpg" alt=""/>
                <fc>This is a caption</fc>
                <cr>This is a credit</cr>
            </fig>
            <p>This is a paragraph</p>
        </sec>
        <sec>
            <title>First Title</title>
            <break name="3-1"/>
            <h1>
                <page num="1"/>Third Heading
            </h1>
            <bl>This is another text</bl>
            <fig>
                <img src="images/img_3-1.jpg" alt=""/>
                <fc>This is a caption</fc>
            </fig>
            <p>This is a paragraph</p>
        </sec>
        <sec>
            <title>Third Title</title>
            <break name="4-1"/>
            <h1>
                <page num="1"/>Fourth Heading
            </h1>
            <bl>This is another text</bl>
            <p>This is a paragraph</p>
            <fig>
                <img src="images/img_4-1.jpg" alt=""/>
                <fc>This is a caption</fc>
                <cr>This is a credit</cr>
            </fig>
            <break name="5-1"/>
            <h1>
                <page num="1"/>Fifth Heading
            </h1>
            <bl>This is another text</bl>
            <fig>
                <img src="images/img_5-1.jpg" alt=""/>
                <fc>This is a caption</fc>
                <cr>This is a credit</cr>
            </fig>
            <p>This is a paragraph</p>
        </sec>
    </body>
</repubold>

In this, all the <break> tags are followed by <h1> . So, I want to check the elements before <h1> , if any. If it is not <psf> then it will show an error. Because I want that <psf> is the only acceptable tag between <break> and <h1> . It can be <psf> or nothing, but if there is any other <xyz> tag, then it will show an error.

Please help.

I have tried this, but the code is not working:

var pagetag = xdoc.Descendants("break").Descendants("h1")
.Where(br => br.ElementsBeforeSelf("h1") != new XElement("psf") ||                                                                 
br.ElementsBeforeSelf("h1") != new XElement("break"))
.Select(br => br.Attribute("name").Value.Trim())
.Aggregate((a, b) => a + ", " + b);

MessageBox.Show("The following articles have invalid tags before <h1>: " + pagetag);

The first problem is that ElementsBeforeSelf() returns a sequence of elements, but you're checking whether that sequence is equal to a single XElement - and comparing them by reference using != .

You're also asking for the descendants of break elements - and there aren't any. I think you just want all the h1 elements.

To clarify your requirement, I think you're trying to find all the h1 elements, where the last sibling element before the h1 is neither break nor psf . For each of those elements, you want to find the latest break element before the h1 (if there is one) and report the name attribute.

Assuming that's the case, here's some code which I believe does what you want, with comments explaining it:

using System;
using System.Linq;
using System.Xml.Linq;

public class Test
{
    public static void Main()
    {
        var xdoc = XDocument.Load("test.xml");
        XName brName = "break";
        XName psfName = "psf";

        var invalidNames = 
            from h1 in xdoc.Descendants("h1")
            // Find the last sibling element before the h1
            let previous = h1.ElementsBeforeSelf().LastOrDefault()
            // It's invalid if there isn't a previous element, or it has
            // a name other than break or psf
            where previous?.Name != brName && previous?.Name != psfName
            // Get the name to report, handling the case where there's
            // no previous break or no "name" attribute
            select ((string) h1.ElementsBeforeSelf(brName).LastOrDefault()?.Attribute("name")) ?? "(no named break)";

        Console.WriteLine(string.Join(", ", invalidNames));
    }
}

It has a bit of a flaw, in that if an <h1> is invalid, but has no immediate <break> predecessor, it will look back as far as the earlier one to find a name... so if you remove the <break name="5-1"/> element for example, it'll report the name of "4-1" as being invalid, as that's the last break element before the h1 that was after 5-1. I don't know how important that is to you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM