简体   繁体   中英

How can I pull in all xml nodes that are one of two types of nodes with Linq-to-xml?

I am parsing through HTML using Linq-to-sql. Right now to get a specific paragraph tag I'm using the following code:

        var paragraphs = contentDiv.Parent.Parent.Parent.Parent.Parent.Elements("p").ToList();

However, one of the sites I am parsing has P tags with tags after them. So the markup is like:

<p>...</p>
<ul><li>...</li></ul>
<p>...</p>
<ul><li>...</li></ul>
<p>...</p>
<ul><li>...</li></ul>
<p>...</p>
<ul><li>...</li></ul>

I need to get all the text inside all p tags and inside all ul tags but I need the content in the order that it appears in the HTML. Essentially I'd like something similar to:

        var paragraphs = contentDiv.Parent.Parent.Parent.Parent.Parent.Elements("p" || "ul").ToList();

How would I go about doing this?

And no, these P and UL tags are not sectioned off by themselves, so I can't just get all content in that parent XElement.

Sounds like you want

contentDiv.Parent.Parent.Parent.Parent.Parent.Elements()
          .Where(x => x.Name.LocalName == "p" || x.Name.LocalName == "ul")
          .ToList();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM