简体   繁体   English

从在线网站C#读取html

[英]Reading html from online website C#

I am reading websites in C# and get contents as string....there are some sites which do not have well formed html structure. 我正在使用C#阅读网站,并以字符串形式获取内容。...有些网站的HTML结构格式不正确。

I tried HtmlAgilityPack and some others but they need well formed html which is not possible in my case. 我尝试了HtmlAgilityPack和其他一些工具,但是它们需要格式正确的html,在我看来,这是不可能的。

Now i need a very simple way to read it by Div or span id/class. 现在,我需要一种非常简单的方法来按Div或span id / class读取它。

Here is my html http://jsfiddle.net/bwJU7/ 这是我的html http://jsfiddle.net/bwJU7/

please give me a simple C# code which will read 请给我一个简单的C#代码,它将读取

div class="item " 

and get title , price , photos and description in my html. 并在我的html中获取titlepricephotosdescription

If you load content as a string and do not expect any regular structure from it then Regular Expressions are your friend. 如果您将内容加载为字符串并且不希望使用任何正则结构,则正则表达式将是您的朋友。

Something like this might help you: 这样的事情可能会帮助您:

String content = "Your content goes here";

var regex = new Regex("<div(?:.*?)class=\"item\"[^>]*>(.*?)</div>");
foreach (Match div in regex.Matches(content))
{               
    Console.WriteLine(div.Groups[0].Value);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM