简体   繁体   English

使用 Python 使用 Javascripts 函数抓取网页

[英]Scraping webpage with Javascripts functions using Python

I need to retrieve some information from a webpage.我需要从网页中检索一些信息。 The page is like a powerpoint: several slides are shown one by one.该页面就像一个powerpoint:几张幻灯片一张一张地显示。 To move from one slide to another you have to press a button that runs a js function " load_image_btn('plus') " which change the image.要从一张幻灯片移动到另一张幻灯片,您必须按下一个按钮,该按钮运行一个 js 函数“ load_image_btn('plus') ”,它会更改图像。 The URL is exactly the same, and the HTML code only changes de URL of the img " someurl/546 ". URL完全一​​样,HTML代码只改变了img“ someurl/546 ”的URL。

Is there any way to execute that function from python iteratively so I can get all the images?有什么方法可以从 python 迭代执行该函数,以便我可以获得所有图像?

One generic way to cope with Javascript-induced problems is to use a headless browser to fully execute each page and then scrape from there.处理 Javascript 引起的问题的一种通用方法是使用无头浏览器完全执行每个页面,然后从那里抓取。

For my last similar project I used a service that provides instances of headless webbrowsers that can be controlled via API, namely https://scrapinghub.com/splash .在我的上一个类似项目中,我使用了一项服务,该服务提供了可通过 API 控制的无头 Web 浏览器实例,即https://scrapinghub.com/splash

But I am sure there are many alternatives.但我相信有很多选择。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM