[英]scraping a response from a selected option in dropdown list
This is an example of a page that lists baseball stats for a selected player, defaulting to the most recent year (2014, soon to be 2015) http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325 这是一个页面示例,该页面列出了选定球员的棒球统计信息,默认为最近的年份(2014年,即将到2015年) http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx? playerId = 76325
The drop down list allows the user to selected years back to 2010, but doesn't not change the displayed url. 下拉列表允许用户选择回溯到2010年的年份,但不会更改显示的url。 How can I scrape all the available years, from each value in the drop down list? 如何从下拉列表中的每个值中抓取所有可用的年份?
I'm currently using Python and BeautifulSoup, but I'm willing to use whatever will get the job done. 我目前正在使用Python和BeautifulSoup,但我愿意使用任何可以完成工作的方法。
<select name="ctl00$ctl00$cphContainer$cphContents$ddlYear"
onchange="javascript:setTimeout('__doPostBack(\'ctl00$ctl00$cphContainer$cphContents$ddlYear\',\'\')', 0)"
id="cphContainer_cphContents_ddlYear"
class="select02 mgt30">
<option value="2014">2014</option>
<option value="2013">2013</option>
<option selected="selected" value="2012">2012</option>
<option value="2011">2011</option>
<option value="2010">2010</option>
Do it in two steps: 分两个步骤进行:
ctl00$ctl00$cphContainer$cphContents$ddlYear
parameter which is responsible for the year 发出POST请求,将输入值与负责ctl00$ctl00$cphContainer$cphContents$ddlYear
参数一起解析 Implementation example for year 2013 (using requests
and BeautifulSoup
): 2013年的实现示例(使用requests
和BeautifulSoup
):
from bs4 import BeautifulSoup
import requests
url = 'http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325'
with requests.Session() as session:
session.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36'}
# parsing parameters
response = session.get(url)
soup = BeautifulSoup(response.content)
data = {
'ctl00$ctl00$cphContainer$cphContents$ddlYear': '2013',
'ctl00$ctl00$txtSearchWord': '',
'__EVENTTARGET': soup.find('input', {'name': '__EVENTTARGET'}).get('value', ''),
'__EVENTARGUMENT': soup.find('input', {'name': '__EVENTARGUMENT'}).get('value', ''),
'__LASTFOCUS': soup.find('input', {'name': '__LASTFOCUS'}).get('value', ''),
'__VIEWSTATE': soup.find('input', {'name': '__VIEWSTATE'}).get('value', ''),
'__VIEWSTATEGENERATOR': soup.find('input', {'name': '__VIEWSTATEGENERATOR'}).get('value', ''),
'__EVENTVALIDATION': soup.find('input', {'name': '__EVENTVALIDATION'}).get('value', ''),
}
# parsing data
response = session.post(url, data=data)
soup = BeautifulSoup(response.content)
for row in soup.select('table.tData01 tr'):
print [td.text for td in row.find_all('td')]
This prints the contents of all stats tables for 2013: 这将打印2013年所有统计信息表的内容:
[u'KIA', u'16', u'0.364', u'55', u'8', u'20', u'3', u'0', u'3', u'11', u'5', u'0', u'14', u'0', u'14', u'1']
[u'LG', u'15', u'0.321', u'53', u'7', u'17', u'1', u'0', u'2', u'9', u'1', u'1', u'6', u'3', u'10', u'2']
[u'NC', u'16', u'0.237', u'59', u'5', u'14', u'2', u'0', u'2', u'10', u'2', u'0', u'3', u'0', u'17', u'2']
[u'SK', u'16', u'0.235', u'51', u'7', u'12', u'1', u'0', u'3', u'13', u'1', u'3', u'13', u'1', u'12', u'4']
[u'\ub450\uc0b0', u'16', u'0.368', u'57', u'16', u'21', u'2', u'1', u'4', u'21', u'2', u'1', u'12', u'0', u'13', u'2']
[u'\ub86f\ub370', u'16', u'0.375', u'56', u'9', u'21', u'4', u'0', u'3', u'13', u'4', u'3', u'11', u'0', u'9', u'3']
[u'\uc0bc\uc131', u'16', u'0.226', u'62', u'8', u'14', u'5', u'0', u'3', u'10', u'0', u'0', u'8', u'1', u'15', u'1']
[u'\ud55c\ud654', u'15', u'0.211', u'57', u'7', u'12', u'3', u'0', u'2', u'9', u'0', u'0', u'1', u'1', u'19', u'3']
...
An example using Mechanize and Ruby. 使用Mechanize和Ruby的示例。 Modify the form field and submit. 修改表单字段并提交。
#!/usr/bin/env ruby
require 'mechanize'
agent = Mechanize.new{ |agent| agent.history.max_size=0 }
agent.user_agent = 'Mozilla/5.0'
url = "http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325"
page = agent.get(url)
form = page.forms[0]
p form['ctl00$ctl00$cphContainer$cphContents$ddlYear']
form['ctl00$ctl00$cphContainer$cphContents$ddlYear'] = 2013
page = form.submit
form = page.forms[0]
p form['ctl00$ctl00$cphContainer$cphContents$ddlYear']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.