簡體   English   中英

如何在python中使用硒定位四個元素

[英]How to locate the four elements using selenium in python

我正在嘗試向此[url] [1]發布幾個參數,然后按“提交”下載生成的csv文件。

我認為至少需要5個步驟。

不幸的是,我認為您將無法通過請求執行此操作。 據我所知,單擊“提交”時沒有進行POST。 似乎所有數據都是由JavaScript生成的,而這些請求無法處理。

您可以嘗試使用Selenium之類的工具來自動化瀏覽器(可以處理JS),然后從那里抓取數據。

嘗試這個。 您需要根據需要處理其余部分。 這是要點。 它產生以下結果:

import requests 

url = "http://nxsa.esac.esa.int/nxsa-sl/servlet/observations-metadata?RESOURCE_CLASS=OBSERVATION&ADQLQUERY=SELECT%20DISTINCT%20OBSERVATION.OBSERVATION_OID,OBSERVATION.MOVING_TARGET,OBSERVATION.OBSERVATION_ID,EPIC_OBSERVATION_IMAGE.ICON,EPIC_OBSERVATION_IMAGE.ICON_PREVIEW,RGS_FLUXED_OBSERVATION_IMAGE.ICON,RGS_FLUXED_OBSERVATION_IMAGE.ICON_PREVIEW,EPIC_MOVING_TARGET_OBSERVATION_IMAGE.ICON,EPIC_MOVING_TARGET_OBSERVATION_IMAGE.ICON_PREVIEW,RGS_FLUXED_MOVING_TARGET_OBSERVATION_IMAGE.ICON,RGS_FLUXED_MOVING_TARGET_OBSERVATION_IMAGE.ICON_PREVIEW,OM_OBSERVATION_IMAGE.ICON_PREVIEW_V,OM_OBSERVATION_IMAGE.ICON_PREVIEW_B,OM_OBSERVATION_IMAGE.ICON_PREVIEW_L,OM_OBSERVATION_IMAGE.ICON_PREVIEW_U,OM_OBSERVATION_IMAGE.ICON_PREVIEW_M,OM_OBSERVATION_IMAGE.ICON_PREVIEW_S,OM_OBSERVATION_IMAGE.ICON_PREVIEW_W,OM_OBSERVATION_IMAGE.ICON_V,OM_OBSERVATION_IMAGE.ICON_B,OM_OBSERVATION_IMAGE.ICON_L,OM_OBSERVATION_IMAGE.ICON_U,OM_OBSERVATION_IMAGE.ICON_M,OM_OBSERVATION_IMAGE.ICON_S,OM_OBSERVATION_IMAGE.ICON_W,OBSERVATION.REVOLUTION,OBSERVATION.PROPRIETARY_END_DATE,OBSERVATION.RA_NOM,OBSERVATION.DEC_NOM,OBSERVATION.POSITION_ANGLE,OBSERVATION.START_UTC,OBSERVATION.END_UTC,OBSERVATION.DURATION,OBSERVATION.TARGET,PROPOSAL.TYPE,PROPOSAL.CATEGORY,PROPOSAL.AO,PROPOSAL.PI_FIRST_NAME,PROPOSAL.PI_SURNAME,TARGET_TYPE.DESCRIPTION,OBSERVATION.LII,OBSERVATION.BII,OBSERVATION.ODF_VERSION,OBSERVATION.PPS_VERSION,OBSERVATION.COORD_OBS,OBSERVATION.COORD_TYPE%20FROM%20FIELD_NOT_USED%20%20WHERE%20OBSERVATION.PROPRIETARY_END_DATE%3E%272017-10-18%27%20%20AND%20%20(PROPOSAL.TYPE=%27Calibration%27%20OR%20PROPOSAL.TYPE=%27Int%20Calibration%27%20OR%20PROPOSAL.TYPE=%27Co-Chandra%27%20OR%20PROPOSAL.TYPE=%27Co-ESO%27%20OR%20PROPOSAL.TYPE=%27GO%27%20OR%20PROPOSAL.TYPE=%27HST%27%20OR%20PROPOSAL.TYPE=%27Large%27%20OR%20PROPOSAL.TYPE=%27Large-Joint%27%20OR%20PROPOSAL.TYPE=%27Triggered%27%20OR%20PROPOSAL.TYPE=%27Target-Opportunity%27%20OR%20PROPOSAL.TYPE=%27TOO%27%20OR%20PROPOSAL.TYPE=%27Triggered-Joint%27)%20%20%20ORDER%20BY%20OBSERVATION.OBSERVATION_ID&PAGE=1&PAGE_SIZE=100&RETURN_TYPE=JSON"
res = requests.get(url)
data = res.json()
result = data['data']

for item in result:
    ID = item['OBSERVATION__OBSERVATION_ID']   
    Surname = item['PROPOSAL__PI_SURNAME']
    Name = item['PROPOSAL__PI_FIRST_NAME']
    print(ID,Surname,Name)

部分結果(ID和名稱):

0740071301 La Palombara Nicola
0741732601 Kaspi Victoria
0741732701 Kaspi Victoria
0741732801 Kaspi Victoria
0742150101 Grosso Nicolas
0742240801 Roberts Timothy

順便說一句,當您到達目標頁面時,您會在此處看到兩個標簽。 該結果來自(觀察)選項卡。 我上面使用的鏈接也可以在chrome開發人員工具中找到。

由於尚無人發布解決方案,因此請繼續。 您將無法滿足要求,因此硒是您的最佳選擇。 如果要使用以下腳本而不進行任何修改,請檢查:

  • 您使用的是Linux或MacOS,或將dl_dir = '/tmp'更改為dl_dir = '/tmp'目錄
  • 您已安裝chromedriver ,或將驅動程序更改為代碼中的Firefox(並根據Firefox的需求調整下載目錄配置)

這是經過測試的環境:

$ python -V
Python 3.5.3
$ chromedriver --version
ChromeDriver 2.33.506106 (8a06c39c4582fbfbab6966dbb1c38a9173bfb1a2)
$ pip list --format=freeze | grep selenium
selenium==3.6.0

我幾乎注釋了每一行,因此讓代碼來討論一下:

import os
import time
from selenium import webdriver
from selenium.webdriver.common import by
from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver.support import ui, expected_conditions as EC


def main():
    dl_dir = '/tmp'  # temporary download dir so I don't spam the real dl dir with csv files
    # check what files are downloaded before the scraping starts (will be explained later)
    csvs_old = {file for file in os.listdir(dl_dir) if file.startswith('NXSA-Results-') and file.endswith('.csv')}

    # I use chrome so check if you have chromedriver installed
    # pass custom dl dir to browser instance
    chrome_options = webdriver.ChromeOptions()
    prefs = {'download.default_directory' : '/tmp'}
    chrome_options.add_experimental_option('prefs', prefs)
    driver = webdriver.Chrome(chrome_options=chrome_options)
    # open page
    driver.get('http://nxsa.esac.esa.int/nxsa-web/#search')

    # wait for search ui to appear (abort after 10 secs)
    # once there, unfold the filters panel
    ui.WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((by.By.XPATH, '//td[text()="Observation and Proposal filters"]'))).click()
    # toggle observation availability dropdown
    driver.find_element_by_xpath('//input[@title="Observation Availability"]/../../td[2]/div/img').click()
    # wait until the dropdown elements are available, then click "proprietary"
    ui.WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((by.By.XPATH, '//div[text()="Proprietary" and @class="gwt-Label"]'))).click()
    # unfold display options panel
    driver.find_element_by_xpath('//td[text()="Display options"]').click()
    # deselect "pointed observations"
    driver.find_element_by_id('gwt-uid-241').click()
    # select "epic exposures"
    driver.find_element_by_id('gwt-uid-240').click()

    # uncomment if you want to go through the activated settings and verify them
    # when commented, the form is submitted immediately
    #time.sleep(5)

    # submit the form
    driver.find_element_by_xpath('//button/span[text()="Submit"]/../img').click()
    # wait until the results table has at least one row
    ui.WebDriverWait(driver, 10).until(EC.presence_of_element_located((by.By.XPATH, '//tr[@class="MPI"]')))
    # click on save
    driver.find_element_by_xpath('//span[text()="Save table as"]').click()
    # wait for dropdown with "CSV" entry to appear
    el = ui.WebDriverWait(driver, 10).until(EC.element_to_be_clickable((by.By.XPATH, '//a[@title="Save as CSV, Comma Separated Values"]')))
    # somehow, the clickability does not suffice - selenium still whines about the wrong element being clicked
    # as a dirty workaround, wait a fixed amount of time to let js finish ui update
    time.sleep(1)
    # click on "CSV" entry
    el.click()

    # now. selenium can't tell whether the file is being downloaded
    # we have to do it ourselves
    # this is a quick-and-dirty check that waits until a new csv file appears in the dl dir
    # replace with watchdogs or whatever
    dl_max_wait_time = 10  # secs
    seconds = 0
    while seconds < dl_max_wait_time:
        time.sleep(1)
        csvs_new = {file for file in os.listdir(dl_dir) if file.startswith('NXSA-Results-') and file.endswith('.csv')}
        if csvs_new - csvs_old:  # new file found in dl dir
            print('Downloaded file should be one of {}'.format([os.path.join(dl_dir, file) for file in csvs_new - csvs_old]))
            break
        seconds += 1

    # we're done, so close the browser
    driver.close()


# script entry point
if __name__ == '__main__':
    main()

如果一切正常,腳本應輸出:

Downloaded file should be one of ['/tmp/NXSA-Results-1509061710475.csv']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM