I want to fetch data from website. I am using HtmlAgilityPack. In the website content is like this
<div id="list">
<div class="list1">
<a href="example1.com" class="href1" >A1</a>
<a href="example4.com" class="href2" />
</div>
<div class="list2">
<a href="example2.com" class="href1" >A2</a>
<a href="example5.com" class="href2" />
</div>
<div class="list3">
<a href="example3.com" class="href1" >A3</a>
<a href="example6.com" class="href2" />
</div>
</div>
Now, I want to fetch the first two links which has class="href1". I am using code.
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//a[@class='href1'][position()<3]");
But, it is not working. It gives all three links. I want to fetch only first two links. How to do this?
Hey! Now I want to do 1 thing also.
Above, I have only three links with class="href1". Suppose, I have 10 links with class="href1". And I want to fetch only four links from 6th link to 9th link. How to fetch these particular four links?
在应用position()
函数之前,请尝试像将锚定选择器包装在括号中一样:
var nodes = doc.DocumentNode.SelectNodes("(//a[@class='href1'])[position()<3]");
Why not just get them all and use the first two from the returned collection? Whatever xpath you would need to do this would be ultimately a hell of a lot less readable than using LINQ:
using System.Linq;
...
HtmlNodeCollection firstTwoHrefs = doc.DocumentNode
.SelectNodes("//a[@class='href']").Take(2);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.