[英]Python web scraping using Selenium - iterate through href link
I am trying to write a script that uses selenium to download many files which consist of different NHL players information;我正在尝试编写一个脚本,使用 selenium 下载许多包含不同 NHL 球员信息的文件; game-log.
游戏日志。 I want to download a file for each players in the following table: https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single
我想为下表中的每个玩家下载一个文件: https : //www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi 0.1&gpfilt=none&fd=&td=&tgp=410&lines=single
Once on that website, I wanted to click on all the players' name in the table.在那个网站上,我想点击表格中所有玩家的名字。 When a player's name is clicked through the href link, a new window opens.
当通过 href 链接单击玩家姓名时,会打开一个新窗口。 There are few drop-down menus at the top.
顶部有几个下拉菜单。 I want to select "Rate" instead of "Counts" and also select " Game Log" instead of "Player Summary", and then click "Submit".
我想选择“Rate”而不是“Counts”,还要选择“Game Log”而不是“Player Summary”,然后单击“Submit”。 Finally, I want to click on CSV(All) at the bottom to download a CSV file.
最后,我想点击底部的 CSV(All) 来下载一个 CSV 文件。
Here is my current code:这是我当前的代码:
from selenium import webdriver
import csv
from selenium.webdriver.support.ui import Select
from datetime import date, timedelta
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome(chromedriver)
driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
table = driver.find_element_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']")
for row in table.find_elements_by_xpath("//tr[@role='row']")
links = driver.find_element_by_xpath('//a[@href]')
links.click()
select = Select(driver.find_element_by_name('rate'))
select.select_by_value("y")
select1 = Select(driver.find_element_by_name('v'))
select1.select_by_value("g")
select2 = Select(driver.find_element_by_type('submit'))
select2.select_by_value("submit")
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//div[@class="dt-button button-csv button-htm15"]')))
CSVall = driver.find_element_by_xpath('//div[@class="dt-button button-csv button-htm15"]')
CSVall.click()
driver.close()
I try to change different things, but I always get an error.我试图改变不同的东西,但我总是得到一个错误。 Where is the problem ?
问题出在哪儿 ?
Moreover, I think I should probably add a line to wait for the website to load because it takes a few seconds;此外,我想我应该添加一行来等待网站加载,因为它需要几秒钟; after "driver.get".
在“driver.get”之后。 I do not know what should be the expected conditions to end the wait in this case.
我不知道在这种情况下结束等待的预期条件是什么。
Thanks谢谢
you don't need to click each player link but save the URLs as list, also there are several error, you can see working code below你不需要点击每个播放器链接,而是将 URL 保存为列表,也有几个错误,你可以看到下面的工作代码
from selenium import webdriver
import csv
from selenium.webdriver.support.ui import Select
from datetime import date, timedelta
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome(chromedriver)
driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
playerLinks = driver.find_elements_by_xpath("//table[@class='indreg dataTable no-footer DTFC_Cloned']//a")
playerLinks = [p.get_attribute('href') for p in playerLinks]
print(len(playerLinks))
for url in playerLinks:
driver.get(url)
select = Select(driver.find_element_by_name('rate'))
select.select_by_value("y")
select1 = Select(driver.find_element_by_name('v'))
select1.select_by_value("g")
driver.find_element_by_css_selector('input[type="submit"]').click()
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
CSVall.click()
driver.close()
Rather than keep clicking through selections you could grab the playerIds from the first page and concantenate those, along with the strings representing the selections for Rate and Game Log into the queryString part of the new URL.您可以从第一页获取 playerIds 并将它们与代表 Rate 和 Game Log 选项的字符串连接到新 URL 的 queryString 部分中,而不是一直点击选择项。 Sure you can tidy up the following.
当然,您可以整理以下内容。
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def getPlayerId(url):
id = url.split('playerid=')[1]
id = id.split('&')[0]
return id
def makeNewURL(playerId):
return 'https://www.naturalstattrick.com/playerreport.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&stdoi=oi&rate=y&v=g&playerid=' + playerId
#chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome()
driver.get("https://www.naturalstattrick.com/playerteams.php?fromseason=20142015&thruseason=20162017&stype=2&sit=all&score=all&stdoi=std&rate=y&team=ALL&pos=S&loc=B&toi=0.1&gpfilt=none&fd=&td=&tgp=410&lines=single")
links = driver.find_elements_by_css_selector('table.indreg.dataTable.no-footer.DTFC_Cloned [href*=playerid]')
newLinks = []
for link in links:
newLinks.append(link.get_attribute('href'))
for link in newLinks:
playerId = getPlayerId(link)
link = makeNewURL(playerId)
driver.get(link)
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//a[@class="dt-button buttons-csv buttons-html5"][2]')))
CSVall = driver.find_element_by_xpath('//a[@class="dt-button buttons-csv buttons-html5"][2]')
CSVall.click()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.