简体   繁体   English

如何使用C#提取HTML页面上嵌入的JSON

[英]How to extract JSON embedded on a HTML page using C#

The JSON I wish to use is embedded on a HTML page. 我希望使用的JSON嵌入在HTML页面中。 Within a tag on the page there is a statement: 在页面上的标记中,有一条语句:

<script>
jsonRAW = {... heaps of JSON... }

Is there a parser to extract this from HTML? 是否有解析器可从HTML提取此解析器? I have looked at json.NET but it requires its JSON reasonably formatted. 我看过json.NET,但是它需要合理格式化其JSON。

You can try to use HTML Agility pack. 您可以尝试使用HTML Agility Pack。 This can be downloaded as a Nuget Package. 可以作为Nuget软件包下载。 After installing, this is a tutorial on how to use HTML Agility pack. 安装后,这是有关如何使用HTML Agility包的教程 The link has more info but it works like this in code: 该链接具有更多信息,但在代码中的用法如下:

var urlLink = "http://www.google.com/jsonPage"; // 1. Specify url where the json is to read. 

var web = new HtmlWeb(); // Init the HTMl Web

var doc = web.Load (urlLink); // Load our url

if (doc.ParseErrors != null) { // Check for any errors and deal with it. 
}

doc.DocumentNode.SelectSingleNode(""); // Access the dom.

There are other things in between but this should get you started. 两者之间还有其他事情,但这应该可以帮助您入门。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM