I am parsing through HTML using Linq-to-sql. Right now to get a specific paragraph tag I'm using the following code:
var paragraphs = contentDiv.Parent.Parent.Parent.Parent.Parent.Elements("p").ToList();
However, one of the sites I am parsing has P tags with tags after them. So the markup is like:
<p>...</p>
<ul><li>...</li></ul>
<p>...</p>
<ul><li>...</li></ul>
<p>...</p>
<ul><li>...</li></ul>
<p>...</p>
<ul><li>...</li></ul>
I need to get all the text inside all p
tags and inside all ul
tags but I need the content in the order that it appears in the HTML. Essentially I'd like something similar to:
var paragraphs = contentDiv.Parent.Parent.Parent.Parent.Parent.Elements("p" || "ul").ToList();
How would I go about doing this?
And no, these P and UL tags are not sectioned off by themselves, so I can't just get all content in that parent XElement.
Sounds like you want
contentDiv.Parent.Parent.Parent.Parent.Parent.Elements()
.Where(x => x.Name.LocalName == "p" || x.Name.LocalName == "ul")
.ToList();
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.