[英]Using chromedriver to download a generated PDF
Hello I am new web scraping.您好,我是新的网页抓取。 I am trying to use google web diver to click on a link to download the batman movie script, but I have been running into some errors.
我正在尝试使用 google web diver 点击链接下载蝙蝠侠电影脚本,但我遇到了一些错误。 I read somewhere that because the file is generate instead of stored on the database that it might not be possible to download it via a web scraper.
我在某处读到,因为文件是生成的,而不是存储在数据库中,所以可能无法通过网络爬虫下载它。 Can anyone help me?
谁能帮我?
I have the following python script on google colab:我在 google colab 上有以下 python 脚本:
!pip install selenium
!apt-get update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
from selenium import webdriver
download_url = 'https://www.studiobinder.com/blog/batman-begins-script-screenplay-pdf-download'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
wd.get(download_url)
button = wd.find_element_by_tag_name("Download PDF")
button.click()
wd.close()
There is no element matching Download PDF
tag name on that web page.该网页上没有与
Download PDF
标签名称匹配的元素。
This is why your wd.find_element_by_tag_name("Download PDF")
code line obviously throws exception.这就是为什么您的
wd.find_element_by_tag_name("Download PDF")
代码行明显抛出异常的原因。 And if not button
will be a NoneType
object that does not have click()
method.如果不是,
button
将是一个没有click()
方法的NoneType
对象。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.