简体   繁体   English

如何在网页中执行“javascript:__doPostBack”以使用 selenium 下载 pdf 文件?

[英]How to execute “javascript:__doPostBack” in a webpage to download pdf files using selenium?

I have tried all the solutions from this very similar post but unfortunately, while I do not get any helpful error and neither do I get any pdf files in my folder.我已经尝试了这篇非常相似的帖子中的所有解决方案,但不幸的是,虽然我没有收到任何有用的错误,我的文件夹中也没有收到任何 pdf 文件。

To change the configuration so that selenium works headless and downloads to a directory I want, I followed this post and this .要更改配置以便 selenium 无头工作并下载到我想要的目录,我遵循了这篇文章这篇.

However I don't see anything.然而我什么也没看到。 Also the behaviors are different when executing interactively vs when running a script.此外,交互执行与运行脚本时的行为也不同。 When executing interactively I don't see any error but then nothing happens as well.以交互方式执行时,我没有看到任何错误,但也没有任何反应。 When running a script I get a not so useful error:运行脚本时,我收到一个不太有用的错误:

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, f"a[href*={css_selector}']"))).click()
  File "C----\selenium\webdriver\support\wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:

The website in question is here .有问题的网站在这里

The code that I am trying to make working is -我试图使工作的代码是-

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.headless = True

uri = "http://affidavitarchive.nic.in/CANDIDATEAFFIDAVIT.aspx?YEARID=March-2017+(+GEN+)&AC_No=1&st_code=S24&constType=AC"

driver = webdriver.Firefox(options=options, executable_path=r'C:\\Users\\xxx\\geckodriver.exe')

profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', r'C:\\Users\\xxx\\Downloads')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'application/pdf')

# Function that reads the table in the webpage and extracts the links for the pdfs
def get_links_from_table(uri):
    html = requests.get(uri)
    soup = BeautifulSoup(html.content, 'lxml')
    table = soup.find_all('table')[-1]
    candidate_affidavit_links = []
    for link in table.find_all('a'):
        candidate_affidavit_links.append(link.get('href'))
    return candidate_affidavit_links

candidate_affidavit_links_list = get_links_from_table(uri)

driver.get(uri)

# iterate over the javascript links and try to download the pdf files
for js_link in candidate_affidavit_links_list:
    css_selector = js_link.split("'")[1]
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, f"a[href*={css_selector}']"))).click()
    driver.execute_script(js_link)

If all this can be done with Selenium I would try this:如果这一切都可以用 Selenium 完成,我会试试这个:

driver.get(uri)
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "(//table//a)[last()]")))
time.sleep(1)
candidate_affidavit_links = driver.find_elements_by_xpath("(//table//a)[last()]")
for link in candidate_affidavit_links:
    link.click()
    time.sleep(1)

Open the page, wait until at least first link in the table is visible, add some more wait until all the table is surely loaded, get all the a (links) elements to the list, iterate through that list clicking on those elements and putting a delay after each click to make downloading complete.打开页面,等待至少表中的第一个链接可见,再添加一些等待,直到所有表都确定加载,将所有a (链接)元素添加到列表中,遍历该列表,单击这些元素并放入每次点击后延迟以完成下载。
Possibly you will need to put longer delay after clicking each link to complete downloading file before next downloading is started.可能您需要在单击每个链接以完成下载文件后放置更长的延迟,然后才能开始下一次下载。
UPD UPD
To disable pop-ups asking to save file etc try this: Instead of just要禁用要求保存文件等的弹出窗口,请尝试以下操作:而不是仅仅

profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'application/pdf')

put this:把这个:

profile.set_preference('browser.helperApps.neverAsk.saveToDisk", "application/csv,application/excel,application/vnd.ms-excel,application/vnd.msexcel,text/anytext,text/comma-separated-values,text/csv,text/plain,text/x-csv,application/x-csv,text/x-comma-separated-values,text/tab-separated-values,data:text/csv')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk", "application/xml,text/plain,text/xml,image/jpeg,application/octet-stream,data:text/csv')
profile.set_preference('browser.download.manager.showWhenStarting',false)
profile.set_preference('browser.helperApps.neverAsk.openFile","application/csv,application/excel,application/vnd.ms-excel,application/vnd.msexcel,text/anytext,text/comma-separated-values,text/csv,text/plain,text/x-csv,application/x-csv,text/x-comma-separated-values,text/tab-separated-values,data:text/csv')
profile.set_preference('browser.helperApps.neverAsk.openFile","application/xml,text/plain,text/xml,image/jpeg,application/octet-stream,data:text/csv')
profile.set_preference('browser.helperApps.alwaysAsk.force', false)
profile.set_preference('browser.download.useDownloadDir', true)
profile.set_preference('dom.file.createInChild', true)

Not sure you need all this, but I have all this and it works for me不确定您是否需要所有这些,但我拥有所有这些并且对我有用

This is much simpler in chrome:这在 chrome 中要简单得多:

driver = webdriver.Chrome()

driver.execute_cdp_cmd("Page.setDownloadBehavior", {"behavior": "allow", "downloadPath": "/path/to/folder"})

driver.get("http://affidavitarchive.nic.in/CANDIDATEAFFIDAVIT.aspx?YEARID=March-2017+(+GEN+)&AC_No=1&st_code=S24&constType=AC")

for a in driver.find_elements_by_css_selector('a[href*=doPostBack]'):
  a.click()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用javascript下载网页的整个HTML? - How to download entire HTML of a webpage using javascript? 如何使用Java在网页上执行JavaScript - How to execute javascript on a webpage using java 如何通过 Selenium 和 WebDriver 等待 JavaScript __doPostBack 调用 - How do I wait for a JavaScript __doPostBack call through Selenium and WebDriver 如何使用PHP从服务器文件夹中检索文件并使用javascript在网页上显示/下载文件? - How to retrieve files from server folder using PHP and display/download it on a webpage using javascript? 使用Selenium WebDriver在网页内执行javascript - Execute javascript within webpage with selenium WebDriver 如何使用Soap WebServices在javascript中下载pdf? - How to download pdf in javascript using soap webservices? 如何使用 javascript 下载 html 作为 pdf 视图 - how to download html as a pdf view using javascript 使用 Selenium 和 javascript 抓取网页 - Using Selenium to scrape webpage with javascript 如何在 selenium 中执行 javascript - How to execute javascript in selenium Selenium WebDriver IEDriverServer 点击链接 Javascript doPostBack - Selenium WebDriver IEDriverServer Click Link Javascript doPostBack
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM