从在线网站C＃读取html

Question

I am reading websites in C# and get contents as string....there are some sites which do not have well formed html structure. 我正在使用C＃阅读网站，并以字符串形式获取内容。...有些网站的HTML结构格式不正确。

I tried HtmlAgilityPack and some others but they need well formed html which is not possible in my case. 我尝试了HtmlAgilityPack和其他一些工具，但是它们需要格式正确的html，在我看来，这是不可能的。

Now i need a very simple way to read it by Div or span id/class. 现在，我需要一种非常简单的方法来按Div或span id / class读取它。

Here is my html http://jsfiddle.net/bwJU7/ 这是我的html http://jsfiddle.net/bwJU7/

please give me a simple C# code which will read 请给我一个简单的C＃代码，它将读取

div class="item "

and get title , price , photos and description in my html. 并在我的html中获取title ， price ， photos和description 。

Answer 1

If you load content as a string and do not expect any regular structure from it then Regular Expressions are your friend. 如果您将内容加载为字符串并且不希望使用任何正则结构，则正则表达式将是您的朋友。

Something like this might help you: 这样的事情可能会帮助您：

String content = "Your content goes here";

var regex = new Regex("<div(?:.*?)class=\"item\"[^>]*>(.*?)</div>");
foreach (Match div in regex.Matches(content))
{               
    Console.WriteLine(div.Groups[0].Value);
}

从在线网站C＃读取html

问题描述

1 个解决方案

解决方案1
0 已采纳 2013-06-19 12:05:09

从在线网站C＃读取html

问题描述

1 个解决方案

解决方案1 0 已采纳 2013-06-19 12:05:09

解决方案1
0 已采纳 2013-06-19 12:05:09