簡體   English   中英

Python Selenium 網頁抓取:find_elements_by_xpath 返回一個空列表

[英]Python Selenium Webscraping: find_elements_by_xpath returning an empty list

我在大學學習了一些編碼科目,並試圖通過學習 selenium 來分析網球統計數據,這對我來說是全新的。

我正在使用的頁面在這里( https://www.atptour.com/en/scores/results-archive?year=2021 ),我正在關注這個網站的指南( https://www.scrapingbee .com/blog/selenium-python/https://www.scrapingbee.com/blog/practical-xpath-for-web-scraping/ )。 我遇到的特殊問題是在第二個指南網站的副標題“電子商務產品數據提取”下。

我的目標是遍歷錦標賽並提取帶有“結果”按鈕的鏈接,但我遇到了麻煩,因為我的程序只是給了我一個空列表。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options


DRIVER_PATH = "C:\Program Files (x86)\chromedriver.exe"
#driver = webdriver.Chrome(executable_path=DRIVER_PATH)
options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
#driver.get("https://www.nintendo.com/")
#print(driver.page_source)
#driver.quit()
# 1 Data Collection
# 1.1 Find Links to All Tournaments
tournaments_2021_url = "https://www.atptour.com/en/scores/results-archive?year=2021"
#tournament_class = "tourney-result"
driver.get(tournaments_2021_url) # print(driver.page_source)
tournaments_2021_url_list = driver.find_elements_by_xpath("//a[@class='button-border']")
print("\n tournament urls \n")
print(tournaments_2021_url_list)
print(len(tournaments_2021_url_list))
driver.quit()
# 1.2 For Each Tournament, Find Links to Each Match
# 1.3 For Each Match, Extract Relevant Statistics

我希望有一個元素列表或一些奇怪的對象並能夠提取鏈接,但我得到一個 len 0 的空列表。感謝您的幫助。

要打印所有RESULTShref屬性的值,您需要為visibility_of_all_elements_located()引入WebDriverWait ,您可以使用以下任一Locator Strategies

  • 使用PARTIAL_LINK_TEXT

     driver.get("https://www.atptour.com/en/scores/results-archive?year=2021") print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.PARTIAL_LINK_TEXT, "Results")))]) driver.quit()
  • 使用CSS_SELECTOR

     driver.get("https://www.atptour.com/en/scores/results-archive?year=2021") print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[href$='results']")))]) driver.quit()
  • 使用XPATH

     driver.get("https://www.atptour.com/en/scores/results-archive?year=2021") print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[normalize-space()='Results']")))]) driver.quit()
  • 控制台 Output:

     ['https://www.atptour.com/en/scores/archive/delray-beach/499/2021/results', 'https://www.atptour.com/en/scores/archive/antalya/9426/2021/results', 'https://www.atptour.com/en/scores/archive/auckland/301/2021/results', 'https://www.atptour.com/en/scores/archive/melbourne/8998/2021/results', 'https://www.atptour.com/en/scores/archive/melbourne/9428/2021/results', 'https://www.atptour.com/en/scores/archive/pune/891/2021/results', 'https://www.atptour.com/en/scores/archive/atp-cup/8888/2021/results', 'https://www.atptour.com/en/scores/archive/australian-open/580/2021/results', 'https://www.atptour.com/en/scores/archive/new-york/424/2021/results', 'https://www.atptour.com/en/scores/archive/rio-de-janeiro/6932/2021/results', 'https://www.atptour.com/en/scores/archive/singapore/9460/2021/results', 'https://www.atptour.com/en/scores/archive/cordoba/9158/2021/results', 'https://www.atptour.com/en/scores/archive/montpellier/375/2021/results', 'https://www.atptour.com/en/scores/archive/rotterdam/407/2021/results', 'https://www.atptour.com/en/scores/archive/buenos-aires/506/2021/results', 'https://www.atptour.com/en/scores/archive/doha/451/2021/results', 'https://www.atptour.com/en/scores/archive/marseille/496/2021/results', 'https://www.atptour.com/en/scores/archive/santiago/8996/2021/results', 'https://www.atptour.com/en/scores/archive/dubai/495/2021/results', 'https://www.atptour.com/en/scores/archive/acapulco/807/2021/results', 'https://www.atptour.com/en/scores/archive/miami/403/2021/results', 'https://www.atptour.com/en/scores/archive/marrakech/360/2021/results', 'https://www.atptour.com/en/scores/archive/cagliari/9481/2021/results', 'https://www.atptour.com/en/scores/archive/marbella/9462/2021/results', 'https://www.atptour.com/en/scores/archive/houston/717/2021/results', 'https://www.atptour.com/en/scores/archive/monte-carlo/410/2021/results', 'https://www.atptour.com/en/scores/archive/barcelona/425/2021/results', 'https://www.atptour.com/en/scores/archive/belgrade/5053/2021/results', 'https://www.atptour.com/en/scores/archive/estoril/7290/2021/results', 'https://www.atptour.com/en/scores/archive/munich/308/2021/results', 'https://www.atptour.com/en/scores/archive/madrid/1536/2021/results', 'https://www.atptour.com/en/scores/archive/rome/416/2021/results', 'https://www.atptour.com/en/scores/archive/geneva/322/2021/results', 'https://www.atptour.com/en/scores/archive/lyon/7694/2021/results', 'https://www.atptour.com/en/scores/archive/parma/9510/2021/results', 'https://www.atptour.com/en/scores/archive/belgrade/9512/2021/results', 'https://www.atptour.com/en/scores/archive/roland-garros/520/2021/results', 'https://www.atptour.com/en/scores/archive/s-hertogenbosch/440/2021/results', 'https://www.atptour.com/en/scores/archive/stuttgart/321/2021/results', 'https://www.atptour.com/en/scores/archive/halle/500/2021/results', 'https://www.atptour.com/en/scores/archive/london/311/2021/results', 'https://www.atptour.com/en/scores/archive/mallorca/8994/2021/results', 'https://www.atptour.com/en/scores/archive/eastbourne/741/2021/results', 'https://www.atptour.com/en/scores/archive/wimbledon/540/2021/results', 'https://www.atptour.com/en/scores/archive/hamburg/414/2021/results', 'https://www.atptour.com/en/scores/archive/newport/315/2021/results', 'https://www.atptour.com/en/scores/archive/bastad/316/2021/results', 'https://www.atptour.com/en/scores/archive/los-cabos/7480/2021/results', 'https://www.atptour.com/en/scores/archive/gstaad/314/2021/results', 'https://www.atptour.com/en/scores/archive/umag/439/2021/results', 'https://www.atptour.com/en/scores/archive/tokyo/96/2021/results', 'https://www.atptour.com/en/scores/archive/atlanta/6116/2021/results', 'https://www.atptour.com/en/scores/archive/kitzbuhel/319/2021/results', 'https://www.atptour.com/en/scores/archive/washington/418/2021/results', 'https://www.atptour.com/en/scores/archive/toronto/421/2021/results', 'https://www.atptour.com/en/scores/archive/cincinnati/422/2021/results', 'https://www.atptour.com/en/scores/archive/winston-salem/6242/2021/results', 'https://www.atptour.com/en/scores/archive/us-open/560/2021/results', 'https://www.atptour.com/en/scores/archive/nur-sultan/9410/2021/results', 'https://www.atptour.com/en/scores/archive/metz/341/2021/results', 'https://www.atptour.com/en/scores/archive/laver-cup/9210/2021/results', 'https://www.atptour.com/en/scores/archive/san-diego/9569/2021/results', 'https://www.atptour.com/en/scores/archive/sofia/7434/2021/results', 'https://www.atptour.com/en/scores/archive/chengdu/7581/2021/results', 'https://www.atptour.com/en/scores/archive/zhuhai/9164/2021/results', 'https://www.atptour.com/en/scores/archive/shanghai/5014/2021/results', 'https://www.atptour.com/en/scores/archive/beijing/747/2021/results', 'https://www.atptour.com/en/scores/archive/tokyo/329/2021/results', 'https://www.atptour.com/en/scores/archive/indian-wells/404/2021/results', 'https://www.atptour.com/en/scores/archive/moscow/438/2021/results', 'https://www.atptour.com/en/scores/archive/antwerp/7485/2021/results', 'https://www.atptour.com/en/scores/archive/vienna/337/2021/results', 'https://www.atptour.com/en/scores/archive/st-petersburg/568/2021/results', 'https://www.atptour.com/en/scores/archive/basel/328/2021/results', 'https://www.atptour.com/en/scores/archive/paris/352/2021/results', 'https://www.atptour.com/en/scores/archive/stockholm/429/2021/results', 'https://www.atptour.com/en/scores/archive/intesa-sanpaolo-next-gen-atp-finals/7696/2021/results', 'https://www.atptour.com/en/scores/archive/nitto-atp-finals/605/2021/results']
  • 注意:您必須添加以下導入:

     from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC

這是要添加到基本代碼中的更新代碼:

from selenium.webdriver.common.by import By
tournaments_2021_url = "https://www.atptour.com/en/scores/results-archive?year=2021"
self.driver.get(tournaments_2021_url)
tournaments_2021_url_list = self.driver.find_elements(By.XPATH, "//a[@class='button-border']")
print("\nTournament URLs:\n")
for row in tournaments_2021_url_list:
    print(row.get_attribute("href"))
print("\nNumber of rows:")
print(len(tournaments_2021_url_list))

這是運行所有內容后的 output:

Tournament URLs:

https://www.atptour.com/en/scores/archive/delray-beach/499/2021/results
https://www.atptour.com/en/scores/archive/antalya/9426/2021/results
https://www.atptour.com/en/scores/archive/auckland/301/2021/results
https://www.atptour.com/en/scores/archive/melbourne/8998/2021/results
https://www.atptour.com/en/scores/archive/melbourne/9428/2021/results
https://www.atptour.com/en/scores/archive/pune/891/2021/results
https://www.atptour.com/en/scores/archive/atp-cup/8888/2021/results
https://www.atptour.com/en/scores/archive/australian-open/580/2021/results
https://www.atptour.com/en/scores/archive/new-york/424/2021/results
https://www.atptour.com/en/scores/archive/rio-de-janeiro/6932/2021/results
https://www.atptour.com/en/scores/archive/singapore/9460/2021/results
https://www.atptour.com/en/scores/archive/cordoba/9158/2021/results
https://www.atptour.com/en/scores/archive/montpellier/375/2021/results
https://www.atptour.com/en/scores/archive/rotterdam/407/2021/results
https://www.atptour.com/en/scores/archive/buenos-aires/506/2021/results
https://www.atptour.com/en/scores/archive/doha/451/2021/results
https://www.atptour.com/en/scores/archive/marseille/496/2021/results
https://www.atptour.com/en/scores/archive/santiago/8996/2021/results
https://www.atptour.com/en/scores/archive/dubai/495/2021/results
https://www.atptour.com/en/scores/archive/acapulco/807/2021/results
https://www.atptour.com/en/scores/archive/miami/403/2021/results
https://www.atptour.com/en/scores/archive/marrakech/360/2021/results
https://www.atptour.com/en/scores/archive/cagliari/9481/2021/results
https://www.atptour.com/en/scores/archive/marbella/9462/2021/results
https://www.atptour.com/en/scores/archive/houston/717/2021/results
https://www.atptour.com/en/scores/archive/monte-carlo/410/2021/results
https://www.atptour.com/en/scores/archive/barcelona/425/2021/results
https://www.atptour.com/en/scores/archive/belgrade/5053/2021/results
https://www.atptour.com/en/scores/archive/estoril/7290/2021/results
https://www.atptour.com/en/scores/archive/munich/308/2021/results
https://www.atptour.com/en/scores/archive/madrid/1536/2021/results
https://www.atptour.com/en/scores/archive/rome/416/2021/results
https://www.atptour.com/en/scores/archive/geneva/322/2021/results
https://www.atptour.com/en/scores/archive/lyon/7694/2021/results
https://www.atptour.com/en/scores/archive/parma/9510/2021/results
https://www.atptour.com/en/scores/archive/belgrade/9512/2021/results
https://www.atptour.com/en/scores/archive/roland-garros/520/2021/results
https://www.atptour.com/en/scores/archive/s-hertogenbosch/440/2021/results
https://www.atptour.com/en/scores/archive/stuttgart/321/2021/results
https://www.atptour.com/en/scores/archive/halle/500/2021/results
https://www.atptour.com/en/scores/archive/london/311/2021/results
https://www.atptour.com/en/scores/archive/mallorca/8994/2021/results
https://www.atptour.com/en/scores/archive/eastbourne/741/2021/results
https://www.atptour.com/en/scores/archive/wimbledon/540/2021/results
https://www.atptour.com/en/scores/archive/hamburg/414/2021/results
https://www.atptour.com/en/scores/archive/newport/315/2021/results
https://www.atptour.com/en/scores/archive/bastad/316/2021/results
https://www.atptour.com/en/scores/archive/los-cabos/7480/2021/results
https://www.atptour.com/en/scores/archive/gstaad/314/2021/results
https://www.atptour.com/en/scores/archive/umag/439/2021/results
https://www.atptour.com/en/scores/archive/tokyo/96/2021/results
https://www.atptour.com/en/scores/archive/atlanta/6116/2021/results
https://www.atptour.com/en/scores/archive/kitzbuhel/319/2021/results
https://www.atptour.com/en/scores/archive/washington/418/2021/results
https://www.atptour.com/en/scores/archive/toronto/421/2021/results
https://www.atptour.com/en/scores/archive/cincinnati/422/2021/results
https://www.atptour.com/en/scores/archive/winston-salem/6242/2021/results
https://www.atptour.com/en/scores/archive/us-open/560/2021/results
https://www.atptour.com/en/scores/archive/nur-sultan/9410/2021/results
https://www.atptour.com/en/scores/archive/metz/341/2021/results
https://www.atptour.com/en/scores/archive/laver-cup/9210/2021/results
https://www.atptour.com/en/scores/archive/san-diego/9569/2021/results
https://www.atptour.com/en/scores/archive/sofia/7434/2021/results
https://www.atptour.com/en/scores/archive/chengdu/7581/2021/results
https://www.atptour.com/en/scores/archive/zhuhai/9164/2021/results
https://www.atptour.com/en/scores/archive/shanghai/5014/2021/results
https://www.atptour.com/en/scores/archive/beijing/747/2021/results
https://www.atptour.com/en/scores/archive/tokyo/329/2021/results
https://www.atptour.com/en/scores/archive/indian-wells/404/2021/results
https://www.atptour.com/en/scores/archive/moscow/438/2021/results
https://www.atptour.com/en/scores/archive/antwerp/7485/2021/results
https://www.atptour.com/en/scores/archive/vienna/337/2021/results
https://www.atptour.com/en/scores/archive/st-petersburg/568/2021/results
https://www.atptour.com/en/scores/archive/basel/328/2021/results
https://www.atptour.com/en/scores/archive/paris/352/2021/results
https://www.atptour.com/en/scores/archive/stockholm/429/2021/results
https://www.atptour.com/en/scores/archive/intesa-sanpaolo-next-gen-atp-finals/7696/2021/results
https://www.atptour.com/en/scores/archive/nitto-atp-finals/605/2021/results

Number of rows:
78

這還具有避免棄用警告的額外好處。 driver.find_elements_by_xpath已“棄用”,運行pytest后會顯示警告消息。 較新的driver.find_elements(By.XPATH, XPATH)避免了這種情況,盡管它確實添加了額外的導入行, from selenium.webdriver.common.by import By到代碼。

我拿走了你的代碼並運行了它,這很好。 它做它應該做的事情。 因此,我的建議是通過調試器運行它並逐步確保一切按預期進行。 也刪除無頭選項,以便您可以目視確認。 檢查您的 chrome 瀏覽器版本並確保它與您使用的 chromedriver 匹配。 (盡管如果版本不匹配,它應該會給您一條錯誤消息。)最后,如果一切都失敗了,請使用另一個瀏覽器嘗試它,例如 firefox 和適當的 geckodriver。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM