简体   繁体   中英

How can I get the HTML generated with javascript?

I want to get the HTML content of a web page but most of the content is generated by javascript.

Is it posible to get this generated HTML (with python if posible)?

The only way I know of to do this from your server is to run the page in an actual browser engine that will parse the HTML, build the normal DOM environment, run the javascript in the page and then reach into that DOM engine and get the innerHTML from the body tag.

This could be done by firing up Chrome with the appropriate URL from Python and then using a Chrome plugin to fetch the dynamically generated HTML after the page was done initializing itself and communicate back to your Python.

Checkout Selenium . It have a python driver, which might be what you're looking for.

If most of the content is generated by Javascript then the Javascript may be doing ajax calls to retrieve the content. You may be able to call those server side scripts from your Python app.

Do check that it doesn't violate the website's terms though and get permission.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM