[英]html parse with HtmlAgilityPack in C#
WebClient webClient = new WebClient();
string page = webClient.DownloadString(
"http://www.deu.edu.tr/DEUWeb/Guncel/v2_index_cron.html");
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(page);
I want to parse the page that is given above but I want to get table's row information. 我想解析上面给出的页面,但我想得到表格的行信息。 I've tried to do with several examples but I could not manage to do that. 我试过几个例子,但我无法做到这一点。 Any suggestion 任何建议
You could for example parse the rows like this: 例如,您可以像这样解析行:
using System.Net;
using HtmlAgilityPack;
namespace ConsoleApplication5
{
class Program
{
static void Main(string[] args)
{
WebClient webClient = new WebClient();
string page = webClient.DownloadString("http://www.deu.edu.tr/DEUWeb/Guncel/v2_index_cron.html");
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(page);
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table");
foreach (var cell in table.SelectNodes("tr/td"))
{
string someVariable = cell.InnerText;
}
}
}
}
For completeness, using LINQ you can easily create an enumerable that contains all non-empty row values: 为了完整性,使用LINQ可以轻松创建包含所有非空行值的枚举:
private static void Main(string[] args)
{
WebClient webClient = new WebClient();
string page = webClient.DownloadString("http://www.deu.edu.tr/DEUWeb/Guncel/v2_index_cron.html");
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(page);
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table");
var rows = table.SelectNodes("tr/td").Select(cell => cell.InnerText).Where(someVariable => !String.IsNullOrWhiteSpace(someVariable)).ToList();
}
Here's an example of enumerating through all of the table cells and writing out each ones inner text to the console 这是一个枚举所有表格单元格并将每个内部文本写入控制台的示例
WebClient webClient = new WebClient();
var page = webClient.DownloadString("http://www.deu.edu.tr/DEUWeb/Guncel/v2_index_cron.html");
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(page);
foreach (var td in doc.DocumentNode.SelectNodes("//table/tr/td"))
{
Console.WriteLine(td.InnerText);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.