简体   繁体   English

C# HtmlAgilityPack 从给定类的所有 div 中获取内容

[英]C# HtmlAgilityPack get content from all div with given class

I have a HTML file that looks like this:我有一个如下所示的 HTML 文件:

<div class="user_meals">
<div class="name">Name Surname</div>
<div class="day_meals">
    <div class="meal">First Meal</div>
</div>  
<div class="day_meals">
    <div class="meal">Second Meal</div>
</div>
<div class="day_meals">

    <div class="meal">Third Meal</div>

</div>
<div class="day_meals">

    <div class="meal">Fourth Meal</div>

</div>

<div class="day_meals">

    <div class="meal">Fifth Meal</div>

</div>

This code repeats a few times.这段代码重复了几次。

I want to get Name and Surname which is between <div> tag with class "name".我想获取位于<div>标签与类“name”之间的NameSurname

This is my code using HtmlAgilityPack:这是我使用 HtmlAgilityPack 的代码:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(@"C:\workspace\file.html");

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@class='name']"))
{
    string vaule = node.InnerText;
}

But actually it doesn't work.但实际上它不起作用。 Visual Studio throws me Exception: Visual Studio 向我抛出异常:

An unhandled exception of type 'System.NullReferenceException'. “System.NullReferenceException”类型的未处理异常。

You are using wrong method to load HTML from a path LoadHtml expect HTML and not location of the file.您正在使用错误的方法从LoadHtml期望 HTML 的路径而不是文件的位置加载 HTML。 Use Load instead.改用Load

The error you are getting is quite misleading as all properties are not null and standard tips from What is a NullReferenceException, and how do I fix it?您收到的错误非常具有误导性,因为所有属性都不是 null 和来自什么是 NullReferenceException 的标准提示, 我该如何解决? don't apply.不申请。

Essentially this comes from the fact SelectNodes correctly returns null as there are not elements matching the query and foreach throws on it.本质上,这来自于SelectNodes正确返回null的事实,因为没有与查询匹配的元素并且foreach在其上抛出。

Fixed code:固定代码:

HtmlDocument doc = new HtmlDocument();
// either doc.Load(@"C:\workspace\file.html") or pass HTML:
doc.LoadHtml("<div class='user_meals'><div class='name'>Name Surname</div></div> ");
var nodes = doc.DocumentNode.SelectNodes("//div[@class='name']");
// SelectNodes returns null if nothing found - may need to check 
if (nodes == null)
{ 
    throw new InvalidOperationException("Where all my nodes???");    
}
foreach (HtmlNode node in nodes)
{
    string vaule = node.InnerText;
    vaule.Dump();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM