[英]Getting the value of JavaScript/HTML variables in C#
There is a webpage I am trying to extract data from. 我正在尝试从中提取数据的网页。 By looking at the HTML in the page Source, I can find the data I am interested inside script tags.
通过查看页面Source中的HTML,我可以在脚本标记中找到我感兴趣的数据。 It looks like the following:
它看起来如下:
<html>
<script type="text/javascript">
window.gon = {};
gon.default_profile_mode = false;
gon.user = null;
gon.product = "shoes";
gon.books_jsonarray = [
{
"title": "Little Sun",
"authors": [
"John Smith"
],
edition: 2,
year: 2009
},
{
"title": "Little Prairie",
"authors": [
"John Smith"
],
edition: 3,
year: 2009
},
{
"title": "Little World",
"authors": [
"John Smith",
"Mary Neil",
"Carla Brummer"
],
edition: 3,
year: 2014
}
];
</script>
</html>
What I would like to achieve is, call the webpage by using its url, then retrieving the 'gon' variable from JavaScript and store it in a C# variable. 我想要实现的是,使用其URL调用网页,然后从JavaScript中检索'gon'变量并将其存储在C#变量中。 In other words, in C#, I would like to have a data structure (a dictionary for instance) that would hold the value of 'gon'.
换句话说,在C#中,我希望有一个数据结构(例如字典),它将保存'gon'的值。
I have tried researching how to get a variable defined in JavaScript via C# WebBrowser, and this is what I found: 我已经尝试过研究如何通过C#WebBrowser获取JavaScript中定义的变量,这就是我发现的:
using System;
using System.Collections.Generic;
using System.Windows.Forms;
using System.Net;
using System.Runtime.InteropServices;
using System.Text.RegularExpressions;
using mshtml;
namespace Mynamespace
{
public partial class Form1 : Form
{
public WebBrowser WebBrowser1 = new WebBrowser();
private void Form1_Load(object sender, EventArgs e)
{
string myurl = "http://somewebsite.com"; //Using WebBrowser control to load web page
this.WebBrowser1.Navigate(myurl);
}
private void btnGetValueFromJs_Click(object sender, EventArgs e)
{
var mydoc = this.WebBrowser1.Document;
IHTMLDocument2 vDocument = mydoc.DomDocument as IHTMLDocument2;
IHTMLWindow2 vWindow = (IHTMLWindow2)vDocument.parentWindow;
Type vWindowType = vWindow.GetType();
object strfromJS = vWindowType.InvokeMember("mystr",
BindingFlags.GetProperty, null, vWindow, new object[] { });
//Here, I am able to see the string "Hello Sir"
object gonfromJS = vWindowType.InvokeMember("gon",
BindingFlags.GetProperty, null, vWindow, new object[] { });
//Here, I am able to see the object gonfromJS as a '{System.__ComObject}'
object gonbooksfromJS = vWindowType.InvokeMember("gon.books_jsonarray",
BindingFlags.GetProperty, null, vWindow, new object[] { });
//This error is thrown: 'An unhandled exception of type 'System.Runtime.InteropServices.COMException' occurred in mscorlib.dll; (Exception from HRESULT: 0x80020006 (DISP_E_UNKNOWNNAME))'
}
}
}
I am able to retrieve values of string or number variables such as: 我能够检索字符串或数字变量的值,例如:
var mystr = "Hello Sir";
var mynbr = 8;
However, even though I am able to see that the 'gon' variable is being passed as a '{System.__ComObject}', I don't know how to parse it in order to see the values of its sub components. 但是,即使我能够看到'gon'变量作为'{System .__ ComObject}'传递,我也不知道如何解析它以查看其子组件的值。 It would be nice if I could parse it, but if not, what I would like to have instead, is a C# Data Structure with keys/values that contains all the sub infos for the gon variable, and especially, be able to view the variable 'gon.books_jsonarray'.
如果我可以解析它会很好,但如果没有,我想要的是一个C#数据结构,其中的键/值包含gon变量的所有子信息,尤其是能够查看变量'gon.books_jsonarray'。
Any help on how to achieve this would be very much appreciated. 任何有关如何实现这一目标的帮助将非常感激。 Note that I cannot change the source html/javascript in anyway, and so, what I need is a C# code that would allow to reach my goal.
请注意,我无论如何都无法更改源html / javascript,因此,我需要的是一个C#代码,可以实现我的目标。
You can cast the result of InvokeMember() to dynamic and use the property names directly in your C# code. 您可以将InvokeMember()的结果转换为动态,并直接在C#代码中使用属性名称。 Array indexing is tricky but can be done with another use of InvokeScript(), see my example:
数组索引很棘手但可以通过另一种InvokeScript()来完成,请参阅我的示例:
private void btnGetValueFromJs_Click(object sender, EventArgs e)
{
var mydoc = this.WebBrowser1.Document;
IHTMLDocument2 vDocument = mydoc.DomDocument as IHTMLDocument2;
IHTMLWindow2 vWindow = (IHTMLWindow2)vDocument.parentWindow;
Type vWindowType = vWindow.GetType();
var gonfromJS = (dynamic)vWindowType.InvokeMember("gon",
BindingFlags.GetProperty, null, vWindow, new object[] { });
var length = gonfromJS.books_jsonarray.length;
for (var i = 0; i < length; ++i)
{
var book = (dynamic) mydoc.InvokeScript("eval", new object[] { "gon.books_jsonarray[" + i + "]" });
Console.WriteLine(book.title);
/* prints:
* Little Sun
* Little Prairie
* Little World
*/
}
}
You need to use JSON.stringify to convert your gon.books_jsonarray
variable to JSON string 您需要使用JSON.stringify将您的
gon.books_jsonarray
变量转换为JSON字符串
After you can retrive JSON using next C#
code: 在使用下一个
C#
代码检索JSON之后:
var gonFromJS = mydoc.InvokeScript("eval", new object[] { "JSON.stringify(gon.books_jsonarray)" }).ToString(); var gonFromJS = mydoc.InvokeScript(“eval”,new object [] {“JSON.stringify(gon.books_jsonarray)”})。ToString();
After you can deserialize JSON to object using Newtonsoft.Json 在使用Newtonsoft.Json将JSON反序列化为对象之后
My full code is here: 我的完整代码在这里:
using Newtonsoft.Json;
using System;
using System.Collections.Generic;
using System.Windows.Forms;
namespace WindowsFormsApp1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
var webBrowser = new WebBrowser();
webBrowser.DocumentCompleted += (s, ea) =>
{
var mydoc = webBrowser.Document;
var gonFromJS = mydoc.InvokeScript("eval", new object[] { "JSON.stringify(gon.books_jsonarray)" }).ToString();
var gonObject = JsonConvert.DeserializeObject<List<Books>>(gonFromJS);
};
var myurl = "http://localhost/test.html";
webBrowser.Navigate(myurl);
}
private class Books
{
public string Title { get; set; }
public List<string> Authors { get; set; }
public int Edition { get; set; }
public int Year { get; set; }
}
}
}
Also you can see output on screenshot: 您还可以在屏幕截图上看到输出:
EDIT : 编辑 :
Also you can have a trouble with JSON.stringify
method. 您也可能遇到
JSON.stringify
方法的问题。
It can returns null
. 它可以返回
null
。
In this case you can review SO topics: here and here . 在这种情况下,您可以查看SO主题: 此处和此处 。
If JSON.stringify
method returns null then try to add next code to your HTML page: 如果
JSON.stringify
方法返回null,则尝试将下一个代码添加到HTML页面:
<head>
<meta http-equiv='X-UA-Compatible' content='IE=edge' >
</head>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.