简体   繁体   English

如何获取用javascript生成的HTML?

[英]How can I get the HTML generated with javascript?

I want to get the HTML content of a web page but most of the content is generated by javascript. 我想获取网页的HTML内容,但是大部分内容是由javascript生成的。

Is it posible to get this generated HTML (with python if posible)? 是否有可能获得此生成的HTML(如果可能,请使用python)?

The only way I know of to do this from your server is to run the page in an actual browser engine that will parse the HTML, build the normal DOM environment, run the javascript in the page and then reach into that DOM engine and get the innerHTML from the body tag. 我知道从服务器执行此操作的唯一方法是在实际的浏览器引擎中运行该页面,该引擎将解析HTML,构建普通的DOM环境,运行页面中的javascript,然后进入该DOM引擎并获取正文标签中的innerHTML。

This could be done by firing up Chrome with the appropriate URL from Python and then using a Chrome plugin to fetch the dynamically generated HTML after the page was done initializing itself and communicate back to your Python. 这可以通过以下方式完成:使用Python中的相应网址启动Chrome,然后在页面完成自身初始化并与您的Python通信之后,使用Chrome插件来获取动态生成的HTML。

Checkout Selenium . 检出 It have a python driver, which might be what you're looking for. 它有一个python驱动程序,可能正是您要寻找的。

If most of the content is generated by Javascript then the Javascript may be doing ajax calls to retrieve the content. 如果大多数内容是由Javascript生成的,则Javascript可能正在执行Ajax调用以检索内容。 You may be able to call those server side scripts from your Python app. 您也许可以从Python应用程序中调用这些服务器端脚本。

Do check that it doesn't violate the website's terms though and get permission. 请检查并确保它没有违反网站的条款并获得许可。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM