[英]Extracting string from Html page using C#
I have a source html page and I want to do the following: 我有一个源html页面,我想执行以下操作:
I would be pretty thankful and grateful if someone could help me in that cause I have no that perfect knowledge of using C#. 如果有人可以帮助我,我会非常感激和感激,因为我对使用C#并不了解。
You could use this code : 您可以使用以下代码:
HttpClient http = new HttpClient();
//I have put Ebay.com. you could use any.
var response = await http.GetByteArrayAsync("ebay.com");
String source = Encoding.GetEncoding("utf-8").GetString(response, 0, response.Length - 1);
source = WebUtility.HtmlDecode(source);
HtmlDocument Nodes = new HtmlDocument();
Nodes.LoadHtml(source);
In the Nodes object, you will have all the DOM elements in the HTML page
. 在Nodes对象中,
all the DOM elements in the HTML page
中将具有all the DOM elements in the HTML page
。
You could use linq to filter out whatever you need. 您可以使用linq过滤掉所需的内容。
Example : 范例:
List<HtmlNode> RequiredNodes = Nodes.DocumentNode.Descendants()
.Where(x => x.Attributes["Class"].Contains("List-Item")).ToList();
You will probably need to install Html Agility Pack NuGet or download it from the link. 您可能需要安装Html Agility Pack NuGet或从链接中下载它。
hope this helps. 希望这可以帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.