简体   繁体   中英

Parsing content with the HTML Agility Pack and Linq

i am trying to get significant content for searched keywords in html.

using the code below for generate a HtmlNodeCollection

var findclasses = doc.DocumentNode.SelectNodes("//body//*[not(self::script)]").Where(x => x.InnerHtml.Contains("SearchedKeywordText") && x.InnerHtml.Contains("SearchedKeyword1Text")).OrderBy(x => x.Name);
                string FirstContent = findclasses.First().InnerText;

And i am getting this result

  • Results View Expanding the Results View will enumerate the IEnumerable
  • [0] Name: "div"} HtmlAgilityPack.HtmlNode
  • [1] Name: "div"} HtmlAgilityPack.HtmlNode
  • [2] Name: "div"} HtmlAgilityPack.HtmlNode
  • [3] Name: "ul"} HtmlAgilityPack.HtmlNode
  • [4] Name: "li"} HtmlAgilityPack.HtmlNode
  • [5] Name: "span"} HtmlAgilityPack.HtmlNode
  • [6] Name: "span"} HtmlAgilityPack.HtmlNode
  • [7] Name: "div"} HtmlAgilityPack.HtmlNode
  • [8] Name: "span"} HtmlAgilityPack.HtmlNode
  • [9] Name: "span"} HtmlAgilityPack.HtmlNode
  • [10] Name: "ul"} HtmlAgilityPack.HtmlNode
  • [11] Name: "li"} HtmlAgilityPack.HtmlNode

But when i want to simple modify the code to get string from outside:

string search1 = "SearchedKeywordText";
string search2 = "SearchedKeyword1Text";
..
..
var findclasses = doc.DocumentNode.SelectNodes("//body//*[not(self::script)]").Where(x => x.InnerHtml.Contains(search1) && x.InnerHtml.Contains(search2)).OrderBy(x => x.Name);
                    string FirstContent = findclasses.First().InnerText;

Result:

  • Results View Expanding the Results View will enumerate the IEnumerable
    Empty "Enumeration yielded no results"

Enumeration in first block is work good for me but after that changes it does not work. Is there any idea for this simple question ?

You are calling .First() on an empty IEnumerable

You could use .Any() to check if findclasses is not empty

if (findclasses.Any())
{
   string firstContent = findclasses.First().InnerText;
}
  • Why is it empty?

maybe there are results but there is a case mistmatch and you need to make your search case insensitive, for that rather than

x.InnerHtml.Contains(search1) 

you can do something like:

x.InnerHtml.IndexOf(search1,StringComparison.InvariantCultureIgnoreCase)>=0

that will return true if the search keyword is found regardless of the letter case.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM