繁体   English   中英

下载 csv 文件与 Python Selenium

[英]Download csv file with Python Selenium

下午好,我正在尝试通过 python 和 selenium 以编程方式下载 csv 文件,因为我需要这样做数百次。 完成这项工作的手动步骤是:

  1. go 到https://propertyinfo.revenue.wi.gov/WisconsinProd/search/advancedsearch.aspx?mode=advanced
  2. Select 在下拉列表中的县名,在文本框中输入“IOWA”,然后单击添加
  3. Select Doc Number 在下拉列表中,在文本框中输入“358407”,然后单击添加
  4. 点击提交
  5. Select 结果表的第一行(在同一选项卡中打开新页面)
  6. 在右侧突出显示 CSV 报告
  7. 单击 Go 保存文件。

我通过第 5 步以编程方式完成了所有工作,并且我相信第 6 步(下面的框 2...)也可以正常工作。 但是,当我在代码中运行 submit2 行时,似乎没有下载任何内容。 我假设对于那些知道 selenium 比我好得多的人来说,这可能是一个简单的捕获/修复。我也尝试过

source = driver.find_element(By.ID, 'DTLNavigator_Report2_ReportsListBox')
action = webdriver.ActionChains(driver)
action.double_click(source)

但它似乎也没有工作。 所以我要么把代码弄乱了,要么似乎找不到下载的文件。 您能提供的任何帮助将不胜感激。 我希望我已经包含了足够的信息供您参考。

到目前为止,以下是我的代码:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.by import By
options = Options()
options.add_experimental_option("prefs", {"download.default_directory": r"D://Users//User//Downloads","download.prompt_for_download": False, "download.directory_upgrade": True, "safebrowsing.enabled": True})
options.headless = True
options.add_argument("--window-size=1920,1200")
DRIVER_PATH = "C://temp/webscraping/chromedriver.exe"
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
driver.get("https://propertyinfo.revenue.wi.gov/WisconsinProd/Search/Disclaimer.aspx?FromUrl=../search/advancedsearch.aspx?mode=advanced")
wait = WebDriverWait(driver,60)
driver.get("https://propertyinfo.revenue.wi.gov/WisconsinProd/search/advancedsearch.aspx?mode=advanced")
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#btAgree"))).click()
box = Select(wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#sCriteria"))))
box.select_by_index(4)
iE = driver.find_element(By.ID, "txtCrit")
iE.send_keys('IOWA')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#btAdd"))).click()
box = Select(wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#sCriteria"))))
box.select_by_index(3)
iE = driver.find_element(By.ID, "txtCrit")
iE.send_keys('358407')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#btAdd"))).click()                                  
submit = driver.find_element(By.ID, "btSearch").click()
myTable = driver.find_element(By.CLASS_NAME, 'SearchResults')
dataSelect = myTable.click()

box2 = Select(wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#DTLNavigator_Report2_ReportsListBox"))))
box2.select_by_value('CSVMailingList')

submit2 = driver.find_element(By.ID, "ReportListButton").click()

在无头模式下下载文件的一种解决方法是使用driver.command_executor方法指定下载路径。

我能够在无头模式下使用以下代码在当前目录中下载 csv -

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.by import By
import os
import time

options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")

DRIVER_PATH = "C://temp/webscraping/chromedriver.exe"
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)

driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
#set download path (set to current working directory in this example)
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow','downloadPath':os.getcwd()}}
command_result = driver.execute("send_command", params)

driver.get("https://propertyinfo.revenue.wi.gov/WisconsinProd/Search/Disclaimer.aspx?FromUrl=../search/advancedsearch.aspx?mode=advanced")
wait = WebDriverWait(driver,60)
driver.get("https://propertyinfo.revenue.wi.gov/WisconsinProd/search/advancedsearch.aspx?mode=advanced")
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#btAgree"))).click()
box = Select(wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#sCriteria"))))
box.select_by_index(4)
iE = driver.find_element(By.ID, "txtCrit")
iE.send_keys('IOWA')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#btAdd"))).click()
box = Select(wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#sCriteria"))))
box.select_by_index(3)
iE = driver.find_element(By.ID, "txtCrit")
iE.send_keys('358407')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#btAdd"))).click()                                  
submit = driver.find_element(By.ID, "btSearch").click()
myTable = driver.find_element(By.CLASS_NAME, 'SearchResults')
dataSelect = myTable.click()

box2 = Select(wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#DTLNavigator_Report2_ReportsListBox"))))
box2.select_by_value('CSVMailingList')

submit2 = driver.find_element(By.ID, "ReportListButton").click()

# wait for csv download to complete
time.sleep(5)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM