简体   繁体   English

如何使用cURL从另一个网站获取javascript生成的内容?

[英]How to get javascript-generated content from another website using cURL?

Basically, a page generates some dynamic content, and I want to get that dynamic content, and not just the static html. 基本上,页面会生成一些动态内容,而我想获取该动态内容,而不仅仅是静态html。 I am not being able to do this with cURL. 我无法使用cURL执行此操作。 Help please. 请帮助。

You can't with just cURL. 您不能仅使用cURL。

cURL will grab the specific raw (static) files from the site, but to get javascript generated content, you would have to put that content into a browser-like envirionment that supports javascript and all other host objects that the javascript uses so the script can run. cURL将从网站上获取特定的原始(静态)文件,但是要获取javascript生成的内容,您必须将该内容放入类似于浏览器的环境中,该环境支持javascript和javascript使用的所有其他宿主对象,因此脚本可以跑。

Then once the script runs, you would have to access the DOM to grab whatever content you wanted from it. 然后,一旦脚本运行,您将必须访问DOM才能从中获取所需的任何内容。

This is why most search engines don't index javascript-generated content. 这就是为什么大多数搜索引擎不会为javascript生成的内容编制索引的原因。 It's not easy. 这并不容易。


If this is one specific site that you're trying to gather info on, you may want to look into exactly how the site gets the data itself and see if you can't get the data directly from that source. 如果这是您要在其上收集信息的特定站点,则可能要仔细研究该站点本身如何获取数据,并查看是否无法直接从该来源获取数据。 For example, is the data embedded in JS in the page (in which case you can just parse out that JS) or is the JS obtained from an ajax call (in which case you can maybe just make that ajax call directly) or some other method. 例如,是页面中嵌入JS的数据(在这种情况下,您可以只解析该JS),还是从ajax调用获得的JS(在这种情况下,您可以直接进行该ajax调用)或其他方法。

您可以在支持js的http://seleniumhq.org上尝试硒。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM