简体   繁体   English

解析HTML以获取脚本变量值

[英]Parsing HTML to get script variable value

I'm trying to find a method of accessing data between tags returned by a server I am making HTTP requests to. 我正在尝试找到一种在向HTTP请求服务器返回的标签之间访问数据的方法。 The document has multiple tags, but only one of the tags has JavaScript code between it, the rest are included from files. 该文档具有多个标签,但是只有一个标签之间具有JavaScript代码,其余的都包含在文件中。 I want to accesses the code between the script tag. 我想访问脚本标记之间的代码。

An example of the code is: 该代码的示例是:

<html>
    // Some HTML

    <script>
        var spect = [['temper', 'init', []],
                    ['fw\/lib', 'init', [{staticRoot: '//site.com/js/'}]],
                    ["cap","dm",[{"tackmod":"profile","xMod":"timed"}]]];

    </script>

    // More HTML
</html>

I'm looking for an ideal way to grab the data between 'spect' and parse it. 我正在寻找一种理想的方式来获取“方面”之间的数据并进行解析。 Sometimes there is a space between 'spect' and the '=' and sometimes there isn't. 有时'spect'和'='之间有一个空格,有时没有。 No idea why, but I have no control over the server. 不知道为什么,但是我无法控制服务器。

I know this question may have been asked, but the responses suggest using something like HTMLAgilityPack, and I'd rather avoid using a library for this task as I only need to get the JavaScript from the DOM once. 我知道可能有人问过这个问题,但是响应建议使用HTMLAgilityPack之类的东西,而我宁愿避免使用库来完成此任务,因为我只需要从DOM中获取JavaScript。

Very simple example of how this could be easy using a HTMLAgilityPack and Jurassic library to evaluate the result: 一个非常简单的示例,说明如何使用HTMLAgilityPackJurassic库轻松评估结果:

var html = @"<html>
             // Some HTML
             <script>
               var spect = [['temper', 'init', []],
               ['fw\/lib', 'init', [{staticRoot: '//site.com/js/'}]],
               [""cap"",""dm"",[{""tackmod"":""profile"",""xMod"":""timed""}]]];
             </script>
             // More HTML
             </html>";

// Grab the content of the first script element
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var script = doc.DocumentNode.Descendants()
                             .Where(n => n.Name == "script")
                             .First().InnerText;

// Return the data of spect and stringify it into a proper JSON object
var engine = new Jurassic.ScriptEngine();
var result = engine.Evaluate("(function() { " + script + " return spect; })()");
var json = JSONObject.Stringify(engine, result);

Console.WriteLine(json);
Console.ReadKey();

Output: 输出:

[["temper","init",[]],["fw/lib","init",[{"staticRoot":"//site.com/js/"}]],["cap","dm",[{"tackmod":"profile","xMod":"timed"}]]] [[“ temper”,“ init”,[]],[“ fw / lib”,“ init”,[{“ staticRoot”:“ // site.com/js/"}]]],["cap”, “ dm”,[{“ tackmod”:“个人资料”,“ xMod”:“定时”}]]]]

Note: I am not accounting for errors or anything else, this merely serves as an example of how to grab the script and evaluate for the value of spect. 注意:我不考虑错误或其他任何原因,这仅作为如何获取脚本并评估spect值的示例。

There are a few other libraries for executing/evaluating JavaScript as well. 还有一些其他的库也可以执行/评估JavaScript。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM