简体   繁体   中英

Using chromedriver to download a generated PDF

Hello I am new web scraping. I am trying to use google web diver to click on a link to download the batman movie script, but I have been running into some errors. I read somewhere that because the file is generate instead of stored on the database that it might not be possible to download it via a web scraper. Can anyone help me?
I have the following python script on google colab:

!pip install selenium
!apt-get update 
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
from selenium import webdriver

download_url = 'https://www.studiobinder.com/blog/batman-begins-script-screenplay-pdf-download'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
wd = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
wd.get(download_url)
button = wd.find_element_by_tag_name("Download PDF")
button.click()
wd.close()

There is no element matching Download PDF tag name on that web page.
This is why your wd.find_element_by_tag_name("Download PDF") code line obviously throws exception. And if not button will be a NoneType object that does not have click() method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM