从网页中获取一些数据

Question

I have used this tutorial to fetch all the content of some webpage via c# code . 我已使用本教程通过c＃代码获取某些网页的所有内容。

I now want to gather into an IEnumerable collection all the strings which are decorated in the following text pattern: (ie MY-TEXT) 我现在想将以以下文本模式修饰的所有字符串收集到IEnumerable集合中：（即MY-TEXT）

data-address=" MY-TEXT "></

How can I do that? 我怎样才能做到这一点？ I tried using "string.split()" but got to many "white noises". 我尝试使用“ string.split（）”，但遇到了许多“白噪声”。

Any idea? 任何想法？

Answer 1

A better solution is to use HtmlAgilityPack and let it handle the parsing/scraping for you. 更好的解决方案是使用HtmlAgilityPack，并让它为您处理解析/抓取。 Here is an example: 这是一个例子：

var web = new HtmlWeb();
var doc = web.Load("http://www.stackoverflow.com");

var nodes = doc.DocumentNode.SelectNodes("//[@data-address]");

foreach (var node in nodes)
{
    Console.WriteLine(node.Attributes["data-address"].Value);
}

This will fetch stackoverflow.com, find all elements which has a data-address attribute and then print the value of that attribute. 这将获取stackoverflow.com，查找具有data-address属性的所有元素，然后打印该属性的值。

Answer 2

如果页面格式正确，则将内容加载到XDocument中，并使用LINQ to XML在其上进行查询。

Answer 3

@alexn is right. @alexn是正确的。 A small correction though: 不过有一个小修正：

  var nodes = doc.DocumentNode.SelectNodes("//*[@data-address]");

added the * 添加了*

从网页中获取一些数据

问题描述

3 个解决方案

解决方案1
4 已采纳 2011-08-27 17:31:59

解决方案2
0 2011-08-27 17:33:51

解决方案3
0 2011-08-27 19:36:57

从网页中获取一些数据

问题描述

3 个解决方案

解决方案1 4 已采纳 2011-08-27 17:31:59

解决方案2 0 2011-08-27 17:33:51

解决方案3 0 2011-08-27 19:36:57

解决方案1
4 已采纳 2011-08-27 17:31:59

解决方案2
0 2011-08-27 17:33:51

解决方案3
0 2011-08-27 19:36:57