简体   繁体   中英

HtmlAgilityPack get two nodes simultaneously in c#

I'm trying to parse an html page , I would get a pair of nodes from this code

 <li class="classli"> 
    <div class="element">element1</div>  
    <div class="description">description1</div> 
  </li>  
  <li class="classli"> 
    <div class="element">element2</div>  
    <div class="description">description2</div> 
  </li>  
  <li class="classli"> 
    <div class="xxxelementclass">element3</div>  
    <div class="description">description3</div> 
  </li>  
  <li class="classli"> 
    <div class="element">element4</div>  
    <div class="xxxclass">description4</div> 
  </li> 

i tried this in c# :

foreach(var node in doc.SelectNodes("//li[contains(@class,classli)]"))
{
    listelement.add(node.SelectSingleNode("//div[contains(@class,element)]").InnerText);
    listdescription(node.SelectSingleNode("//div[contains(@class,description)]").InnerText);
}

in the HTML page, not all the (li) tags contain the same subtags , I would get description and element only where both present

Make your xpath in your for each looks like the following

//li[contains(@class,'classli') and ./div[contains(@class,'element')] and ./div[contains(@class,'description')]]

this will only consider the elements that has both divs with the given classes as child nodes, also note that the xpaths inside your for each need to start looking for decendant nodes starting from the li node, so you need to use ./ for children or .// for decendants such as

./div[contains(@class,'element')]

./div[contains(@class,'description')]

Proper XPath expression to match by CSS class is a bit complicated. Taking a moderate approach ie the 2nd code snippet posted in this other answer , the XPath for your task would be as follow (formatted into lines for readability) :

var query = @"//li[contains(concat(' ', @class, ' '), ' classli ')]
                  [div[contains(concat(' ', @class, ' '), ' element ')]]
                  [div[contains(concat(' ', @class, ' '), ' description ')]]";

foreach(var node in doc.SelectNodes(query))
{
    var elementQuery = "div[contains(concat(' ', @class, ' '), ' element ')]";
    listelement.add(node.SelectSingleNode(elementQuery).InnerText);

    var descriptionQuery = "div[contains(concat(' ', @class, ' '), ' description ')]";
    listdescription.add(node.SelectSingleNode(descriptionQuery).InnerText);
}

AsEnumerableThank you all for the help I solved this way

    foreach(var node in doc.SelectNodes("//li[contains(@class,classli)]"))
    {

   List<HTMLNODE> Child = node.childnodes.where(o=> (o.getattribbutevalue(class,"") == "element") or (o.getattribbutevalue(class,"") == "description")).AsEnumerable().ToList();

    }

For(int i = 0; i <= Child.count-1;i=i+2)
{
listelement.add(Child[i].InnerHtml;
listdescription.add(Child[i+1].InnerHtml;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM