简体   繁体   English

我如何获取JavaScript生成的HTML?

[英]How do i get html produced by javascript?

I know the title is not very clear so i'll make an example: There are site A and site B, let's say they are financial sites. 我知道标题不是很清楚,所以我举一个例子:有站点A和站点B,假设它们是金融站点。 I need just one page (the one regarding italian pizza quotations) from both the sites, to compare some values and to know where and when to sell italian pizza at higher prices. 我只需要两个站点的一页(关于意大利比萨报价),就可以比较一些价值并知道何时何地以更高的价格出售意大利比萨。 Everything is really easy with site A, because it doesn't use javascript and using a browser and clicking on the voice menu "italy > italian pizza" i find the www.siteA.com/italy/italianPizzaValues url that i needed. 站点A一切都非常容易,因为它不使用JavaScript,也不使用浏览器,而是单击语音菜单“意大利>意大利披萨”,我找到了我需要的www.siteA.com/italy/italianPizzaValues网址。 Instead, when i surf site B, clicking on the voice menu italy will redirect to www.siteB.com/italy.do and clicking on the italy's menu voices like Pasta and Pizza won't change the url but just invoke javascript functions (usually very complex ones). 相反,当我浏览网站B时,单击“意大利”语音菜单将重定向到www.siteB.com/italy.do,然后单击“意大利面”和“比萨饼”之类的意大利菜单语音不会更改网址,而只是调用javascript函数(通常非常复杂的)。 So for site A i use libcurl to download the page www.siteA.com/italy/italianPizzaValues and then i parse it. 因此,对于站点A,我使用libcurl下载页面www.siteA.com/italy/italianPizzaValues,然后对其进行解析。 What should i do with site B to obtain the same result and know my italian pizza values for site B? 我应该如何使用网站B来获得相同的结果,并且知道网站B的意大利比萨价值?

In The Productive Programmer , Neal Ford suggests using Selenium for non-testing purposes such as yours. 尼尔·福特(Neal Ford)在The Productive Programmer中建议将用于您的非测试目的。 Selenium works by automating interactions with the web browser. Selenium通过自动与Web浏览器进行交互来工作。 It's designed for testing purposes but can be used for other purposes as Neal Ford suggests. 它是为测试目的而设计的,但也可以按照尼尔·福特的建议用于其他目的。 Using the Selenium IDE , you can record your interactions with the web page, referencing HTML elements (including ones rendered by javascript) and then export the generated code to one of several high level programming languages (Java, .Net, PHP, Python, Perl or Ruby). 使用Selenium IDE ,您可以记录与网页的交互,引用HTML元素(包括由JavaScript渲染的元素),然后将生成的代码导出到几种高级编程语言(Java,.Net,PHP,Python,Perl)中的一种或Ruby)。

Before you go down the route of emulating a real browser and executing the JavaScript, try accessing the page in question in a real browser with a network monitor. 在尝试模拟真实浏览器并执行JavaScript之前,请尝试使用网络监视器在真实浏览器中访问相关页面。 Firefox with Firebug open on the 'Net' tab is one, or Fiddler for IE. 在“网络”选项卡上打开了Firebug的Firefox是IE或Fiddler。

Look through the requests and responses that occur when you click on 'Pizza' and see if there's an obvious XMLHttpRequest that seems to contain the data you are looking for. 查看当您单击“ Pizza”时发生的请求和响应,并查看是否有明显的XMLHttpRequest似乎包含您要查找的数据。 If so, it'll be much quicker to just make that one request. 如果是这样,那么只发出一个请求就会更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM