Hi I want to process information on a html page, with the following code I can get the information This is how the order is received
new-link-1
new-link-2
new-link-3
But when it comes to the new-link-no-title section, it breaks up And it changes to
new-link-3
new-link-1
new-link-2
And at the end of the program it stops with an ArgumentOutOfRangeException error
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = await web.LoadFromWebAsync(Link);
foreach ((var item, int index) in doc.DocumentNode.SelectNodes(".//div[@class='new-link-1']").WithIndex())
{
var x = item.SelectNodes("//div[@class='new-link-2']")[index].InnerText;
var xx = item.SelectNodes("//div[@class='new-link-3']//a")[index];
MessageBox.Show(item.InnerText);
MessageBox.Show(x);
MessageBox.Show(xx.Attributes["href"].Value);
}
and html
<div id="new-link">
<ul>
<li>
<div class="new-link-1"> فصل پنجم</div>
<div class="new-link-2"> تکمیل شده</div>
<div class="new-link-3">
<a href="http://dlldsubtitle.info/Serial/1397/Silicon.Valley.S05_WorldSubtitle.zip">دانلود با لینک مستقیم</a>
</div>
</li>
<li class="new-link-no-titel">
<div class="new-link-1"> فصل ششم</div>
<div class="new-link-2"> درحال پخش</div>
<div class="new-link-3">
<i class="fa fa-arrow-down" title=حال پخش">
</i>
</div>
</li>
<li>
<divs="new-link-1"> قسمت 1</div>
<div class="new-link-2"> پخش شده</div>
<div class="new-link-3">
<a href="http://dl.worldsubtitle.info/Serial/1398/Silicon.Valley.S06E01_WorldSubtitle.zip">دانلودلینک مستقیم</a>
</div>
</li>
<li>
<div class="new-link-1"> قسمت 7</div>
<div class="new-link-2"> پخش شده</div>
<div class="new-link-3">
<a href="http://dl.worldsubtitle.info/Serial/1398/Silicon.Valley.S06E07_WorldSubtitle.zip">دانلود با لینک مستقیم</a>
</div>
</li>
</ul>
</div>
This is what I found to be the issue with your code.
foreach ((var item, int index) in doc.DocumentNode.SelectNodes(".//div[@class='new-link-1']").WithIndex()) //-> Gives 4 indecies for index
item.SelectNodes("//div[@class='new-link-2']") // -> This produces 4 nodes
item.SelectNodes("//div[@class='new-link-3']//a") // -> This produces only 3 nodes
Issue: When you search with //div, you search All nodes.. not just from the item you are currently on.
Solution/Suggestion: Your current code searches all a elements starting from the root node. If you prefix it with a dot instead only the descendants of the current node will be considered. ( Excerpt from here )
foreach (HtmlNode item in doc.DocumentNode.SelectNodes(".//li"))
{
try
{
var x0 = item.SelectSingleNode(".//div[@class='new-link-1']");
var x = item.SelectSingleNode(".//div[@class='new-link-2']");
var xx = item.SelectSingleNode(".//a");
MessageBox.Show(x0.InnerText);
MessageBox.Show(x.InnerText);
if (xx.Attributes["href"] != null)
MessageBox.Show(xx.Attributes["href"].Value);
}
catch { }
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.