简体   繁体   English

在Silverlight 4页上进行网页抓取

[英]Web scraping on silverlight 4 page

There is a web page with a silver light application embedded and I want to scrape it. 有一个嵌入了Silver Light应用程序的网页,我想对其进行抓取。 I wonder if there is a web scraping solution or browser automation solution (or any trick/hack) that supports silverlight in either C#, Java or Python. 我想知道是否有支持C#,Java或Python的Silverlight的Web抓取解决方案或浏览器自动化解决方案(或任何技巧/黑客)。

I am currently trying silvernium but it is quite outdated project and doesn't seem to work properly with silverlight 4. 我目前正在尝试使用Silvernium,但是该项目已经过时,似乎无法在Silverlight 4中正常使用。

Here is some of the html information of the SL object in the page: 这是页面中SL对象的一些html信息:

<object data="data:application/x-silverlight-2," type="application/x-silverlight-2" width="100%" height="100%">

<param name="source" value="PATH/WebSilverlight.xap"/>
<param name="onerror" value="onSilverlightError"/>
<param name="background" value="white"/>
<param name="minRuntimeVersion" value="4.0.50524.0"/>
<param name="autoUpgrade" value="true"/>
<param name="windowless" value="true"/>
<param name="enableautozoom" value="true"/>
...

I have worked successfully with Telerik testing framework for an 我已经成功地使用Telerik测试框架

automation solution that supports silverlight in C# 在C#中支持silverlight的自动化解决方案

It's free and once you get to used to it - very easy, because of the rich API and the cross-browser compatibility. 它是免费的,并且一旦您习惯了它-非常容易,因为它具有丰富的API和跨浏览器的兼容性。 Maybe the trickiest part is to initially config the tests. 也许最棘手的部分是最初配置测试。

Simple example: 简单的例子:

Settings mySettings = new Settings();
mySettings.Web.DefaultBrowser = BrowserType.InternetExplorer;
Manager myManager = new Manager(mySettings);
myManager.Start();    
myManager.LaunchNewBrowser();

myManager.ActiveBrowser.NavigateTo("http://www.example.com");

Element mybtn = myManager.ActiveBrowser.Find.ByTagIndex("input", 3);
myManager.ActiveBrowser.Actions.Click(mybtn);

myManager.Dispose();

A good addition to it is the Windows Inspect tool . Windows Inspect工具是对它的一个很好的补充。 It'll enable you to select any UI element and view the element's accessibility data in some tricky cases. 在某些棘手的情况下,它将使您能够选择任何UI元素并查看该元素的可访问性数据。

Update: 更新:

I've searched some helpful documentation links that I've used back in the days. 我搜索了一些过去有用的文档链接。 Look at Getting started with Silverlight UI Automation and Locating elements . 查看Silverlight UI自动化入门Locating元素

In the end, I implemented a workaround using the computer vision based technology SikuliX and getting a printed PDF out of the Silverlight web app like normal normal user would to get the information. 最后,我使用基于计算机视觉的技术SikuliX实施了一种解决方法,并像普通普通用户一样从Silverlight Web应用程序中获取了打印的PDF来获取信息。 Here is an script that shows how to run this along with Selenium. 这是一个脚本 ,显示了如何与Selenium一起运行。

Another alternative is to hack the requests and keep the session alive while navigating to the information you need using either Scrapy , abot , crawler4j or any other similar technology. 另一种选择是利用Scrapyabotcrawler4j或任何其他类似的技术浏览请求并保持会话活动,同时导航到所需的信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM