[英]Reading html from online website C#
I am reading websites in C# and get contents as string....there are some sites which do not have well formed html structure. 我正在使用C#阅读网站,并以字符串形式获取内容。...有些网站的HTML结构格式不正确。
I tried HtmlAgilityPack
and some others but they need well formed html which is not possible in my case. 我尝试了
HtmlAgilityPack
和其他一些工具,但是它们需要格式正确的html,在我看来,这是不可能的。
Now i need a very simple way to read it by Div or span id/class. 现在,我需要一种非常简单的方法来按Div或span id / class读取它。
Here is my html http://jsfiddle.net/bwJU7/ 这是我的html http://jsfiddle.net/bwJU7/
please give me a simple C# code which will read 请给我一个简单的C#代码,它将读取
div class="item "
and get title
, price
, photos
and description
in my html. 并在我的html中获取
title
, price
, photos
和description
。
If you load content as a string and do not expect any regular structure from it then Regular Expressions are your friend. 如果您将内容加载为字符串并且不希望使用任何正则结构,则正则表达式将是您的朋友。
Something like this might help you: 这样的事情可能会帮助您:
String content = "Your content goes here";
var regex = new Regex("<div(?:.*?)class=\"item\"[^>]*>(.*?)</div>");
foreach (Match div in regex.Matches(content))
{
Console.WriteLine(div.Groups[0].Value);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.