[英]How to scrape a data table from a webpage after applying a filter?
我正在尝试构建一个数据抓取工具,但是,在应用必要的过滤器后,表的值会发生变化。 我不确定如何使用 selenium 或其他工具应用过滤器。
我的计划是加载基表,然后弄清楚如何应用过滤器和 retrofit 我的代码,但即使将基表从网页上删除,我仍然卡住了。 我正在尝试应用的过滤器位于站点“ https://rotogrinders.com/projected-stats/nfl ”上标有“Slates”的下拉工具栏上
我相当有信心这段代码得到了正确的表格:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
url = 'https://rotogrinders.com/projected-stats/nfl-qb?site=fanduel'
driver.get(url)
table = driver.find_element_by_xpath("//*[@id='proj-stats']")
然而,将其转换为 pandas dataframe 并不顺利。
results_table = []
for row in table:
temp = []
columns = row.find_element_by_xpath("//*[@id='proj-stats']/div[1]")
for column in columns:
temp.append(column.text)
results_table.append(temp)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-79-bdda19bc35a3> in <module>
1 results_table = []
----> 2 for row in table:
3 temp = []
4 columns = row.find_element_by_xpath("//*[@id='proj-stats']/div[1]")
5 for column in columns:
TypeError: 'WebElement' object is not iterable
如果您想获取玩家姓名和薪水并加载到 pandas 数据框,请尝试以下代码。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
driver = webdriver.Chrome()
url = 'https://rotogrinders.com/projected-stats/nfl-qb?site=fanduel'
driver.get(url)
table =WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH,"//*[@id='proj-stats']")))
Player_Name = []
Player_Price=[]
for row in driver.find_elements_by_xpath(".//div[@class='player']/a"):
Player_Name.append(row.text)
for row in driver.find_elements_by_xpath(".//div[@class='rgt-col']/div[@class='rgt-hdr'][contains(.,'Salary')]/following-sibling::div"):
Player_Price.append(row.text)
df = pd.DataFrame({"Player Name":Player_Name,"Salary":Player_Price})
print(df)
Output :
Player Name Salary
0 Drew Brees $7.2K
1 Deshaun Watson $8.4K
2 Russell Wilson $8.6K
3 Mitchell Trubisky $6.5K
4 Josh Allen $7.7K
5 Matthew Stafford $7.9K
6 Jacoby Brissett $7.3K
7 Matthew Moore $6.5K
8 Daniel Jones $7.0K
9 Carson Wentz $7.4K
10 Aaron Rodgers $8.1K
11 Kirk Cousins $7.8K
12 Tom Brady $7.9K
13 Jameis Winston $7.5K
14 Jared Goff $8.0K
15 Gardner Minshew $6.9K
16 Ryan Tannehill $7.1K
17 Andy Dalton $6.9K
18 Mason Rudolph $7.1K
19 Jimmy Garoppolo $7.7K
20 Kyle Allen $6.8K
21 Kyler Murray $7.8K
22 Derek Carr $7.3K
23 Case Keenum $6.3K
24 Philip Rivers $7.2K
25 Ryan Fitzpatrick $7.0K
26 Joe Flacco $6.5K
27 Matt Schaub $6.6K
28 Sam Darnold $7.3K
29 Baker Mayfield $7.2K
这一切都在<script>
标签中的 json 格式中。 例如,您可以遍历 slate id,并将它们与这些 slate 的球员和薪水相匹配:
import requests
from bs4 import BeautifulSoup
import json
url = 'https://rotogrinders.com/projected-stats/nfl-qb?site=fanduel'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
script = soup.find_all('script')[12].text
jsonStr_slate = script.split('slates:')[-1]
jsonStr_slate = jsonStr_slate.split('schedules:')
jsonStr_slate = jsonStr_slate[0].rsplit(',',1)[0]
slatesData = json.loads(jsonStr_slate)
script = soup.find_all('script')[13].text
jsonStr = script.split('data = ')[-1]
jsonStr = jsonStr.rsplit(';',4)[0]
jsonData = json.loads(jsonStr)
for each in jsonData:
name = each['player_name']
for slate in each['import_data']:
slate_id = slate['slate_id']
salary = slate['salary']
for k, v in slatesData.items():
if v['importId'] == slate_id:
print ('%-20s $%-8s %s' %(name, salary, k))
Output:
Russell Wilson $8600 8:20pm Thu-Mon
Russell Wilson $8600 2:00pm Main
Russell Wilson $8600 2:00pm Sun-Mon
Russell Wilson $9500 2:00pm SuperFlex
Russell Wilson $8600 5:05pm 4pm Only
Lamar Jackson $8000 8:20pm Thu-Mon
Lamar Jackson $8000 2:00pm Sun-Mon
Lamar Jackson $8800 2:00pm SuperFlex
Mitchell Trubisky $6500 8:20pm Thu-Mon
Mitchell Trubisky $6500 2:00pm Main
Mitchell Trubisky $6500 2:00pm 1pm Only
Mitchell Trubisky $6500 2:00pm Sun-Mon
Mitchell Trubisky $6800 2:00pm SuperFlex
Deshaun Watson $8400 8:20pm Thu-Mon
Dak Prescott $7800 8:20pm Thu-Mon
Dak Prescott $7800 2:00pm Sun-Mon
Josh Allen $7700 8:20pm Thu-Mon
Josh Allen $7700 2:00pm Main
Josh Allen $7700 2:00pm 1pm Only
Josh Allen $7700 2:00pm Sun-Mon
Josh Allen $8400 2:00pm SuperFlex
Jameis Winston $7500 8:20pm Thu-Mon
Jameis Winston $7500 2:00pm Main
Jameis Winston $7500 2:00pm Sun-Mon
Jameis Winston $8200 2:00pm SuperFlex
Jameis Winston $7500 5:05pm 4pm Only
Jimmy Garoppolo $15500 8:20pm SF @ ARI
Jimmy Garoppolo $7600 8:20pm Thu-Mon
Jacoby Brissett $7300 8:20pm Thu-Mon
Jacoby Brissett $7300 2:00pm Main
Jacoby Brissett $7300 2:00pm 1pm Only
Jacoby Brissett $7300 2:00pm Sun-Mon
Jacoby Brissett $7900 2:00pm SuperFlex
Patrick Mahomes $8500 8:20pm Thu-Mon
Patrick Mahomes $8500 2:00pm Main
Patrick Mahomes $8500 2:00pm 1pm Only
Patrick Mahomes $8500 2:00pm Sun-Mon
Patrick Mahomes $9400 2:00pm SuperFlex
Carson Wentz $7400 8:20pm Thu-Mon
Carson Wentz $7400 2:00pm Main
Carson Wentz $7400 2:00pm 1pm Only
Carson Wentz $7400 2:00pm Sun-Mon
Carson Wentz $8000 2:00pm SuperFlex
Aaron Rodgers $8100 8:20pm Thu-Mon
Aaron Rodgers $8100 2:00pm Main
Aaron Rodgers $8100 2:00pm Sun-Mon
Aaron Rodgers $9000 2:00pm SuperFlex
Aaron Rodgers $8100 5:05pm 4pm Only
Derek Carr $7300 8:20pm Thu-Mon
Derek Carr $7300 2:00pm Main
Derek Carr $7300 2:00pm Sun-Mon
Derek Carr $7900 2:00pm SuperFlex
Derek Carr $7300 5:05pm 4pm Only
Tom Brady $7900 8:20pm Thu-Mon
Tom Brady $7900 2:00pm Sun-Mon
Tom Brady $8700 2:00pm SuperFlex
Kirk Cousins $7800 8:20pm Thu-Mon
Kirk Cousins $7800 2:00pm Main
Kirk Cousins $7800 2:00pm 1pm Only
Kirk Cousins $7800 2:00pm Sun-Mon
Kirk Cousins $8500 2:00pm SuperFlex
Daniel Jones $7300 8:20pm Thu-Mon
Daniel Jones $7300 2:00pm Sun-Mon
Kyle Allen $6800 8:20pm Thu-Mon
Kyle Allen $6800 2:00pm Main
Kyle Allen $6800 2:00pm 1pm Only
Kyle Allen $6800 2:00pm Sun-Mon
Kyle Allen $7200 2:00pm SuperFlex
Gardner Minshew $7200 8:20pm Thu-Mon
Philip Rivers $7200 8:20pm Thu-Mon
Philip Rivers $7200 2:00pm Main
Philip Rivers $7200 2:00pm Sun-Mon
Philip Rivers $7700 2:00pm SuperFlex
Philip Rivers $7200 5:05pm 4pm Only
Mason Rudolph $6800 8:20pm Thu-Mon
Mason Rudolph $6800 2:00pm Main
Mason Rudolph $6800 2:00pm 1pm Only
Mason Rudolph $6800 2:00pm Sun-Mon
Mason Rudolph $7200 2:00pm SuperFlex
Sam Darnold $7300 8:20pm Thu-Mon
Sam Darnold $7300 2:00pm Main
Sam Darnold $7300 2:00pm 1pm Only
Sam Darnold $7300 2:00pm Sun-Mon
Sam Darnold $7800 2:00pm SuperFlex
Matthew Stafford $7900 8:20pm Thu-Mon
Matthew Stafford $7900 2:00pm Main
Matthew Stafford $7900 2:00pm Sun-Mon
Matthew Stafford $8700 2:00pm SuperFlex
Matthew Stafford $7900 5:05pm 4pm Only
Kyler Murray $15000 8:20pm SF @ ARI
Kyler Murray $7200 8:20pm Thu-Mon
Brandon Allen $6000 8:20pm Thu-Mon
Brandon Allen $6000 2:00pm Main
Brandon Allen $6000 2:00pm Sun-Mon
Brandon Allen $6200 2:00pm SuperFlex
Brandon Allen $6000 5:05pm 4pm Only
Ryan Tannehill $7100 8:20pm Thu-Mon
Ryan Tannehill $7100 2:00pm Main
Ryan Tannehill $7100 2:00pm 1pm Only
Ryan Tannehill $7100 2:00pm Sun-Mon
Ryan Tannehill $7500 2:00pm SuperFlex
Baker Mayfield $7200 8:20pm Thu-Mon
Baker Mayfield $7200 2:00pm Main
Baker Mayfield $7200 2:00pm Sun-Mon
Baker Mayfield $7700 2:00pm SuperFlex
Baker Mayfield $7200 5:05pm 4pm Only
Ryan Fitzpatrick $7000 8:20pm Thu-Mon
Ryan Fitzpatrick $7000 2:00pm Main
Ryan Fitzpatrick $7000 2:00pm 1pm Only
Ryan Fitzpatrick $7000 2:00pm Sun-Mon
Ryan Fitzpatrick $7400 2:00pm SuperFlex
Case Keenum $6300 8:20pm Thu-Mon
Case Keenum $6300 2:00pm Main
Case Keenum $6300 2:00pm 1pm Only
Case Keenum $6300 2:00pm Sun-Mon
Case Keenum $6600 2:00pm SuperFlex
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.