从下拉列表中的选定选项中抓取响应

Question

This is an example of a page that lists baseball stats for a selected player, defaulting to the most recent year (2014, soon to be 2015) http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325 这是一个页面示例，该页面列出了选定球员的棒球统计信息，默认为最近的年份（2014年，即将到2015年） http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx？ playerId = 76325

The drop down list allows the user to selected years back to 2010, but doesn't not change the displayed url. 下拉列表允许用户选择回溯到2010年的年份，但不会更改显示的url。 How can I scrape all the available years, from each value in the drop down list? 如何从下拉列表中的每个值中抓取所有可用的年份？

I'm currently using Python and BeautifulSoup, but I'm willing to use whatever will get the job done. 我目前正在使用Python和BeautifulSoup，但我愿意使用任何可以完成工作的方法。

<select name="ctl00$ctl00$cphContainer$cphContents$ddlYear"     
        onchange="javascript:setTimeout(&#39;__doPostBack(\&#39;ctl00$ctl00$cphContainer$cphContents$ddlYear\&#39;,\&#39;\&#39;)&#39;, 0)" 
        id="cphContainer_cphContents_ddlYear" 
        class="select02 mgt30">
<option value="2014">2014</option>
<option value="2013">2013</option>
<option selected="selected" value="2012">2012</option>
<option value="2011">2011</option>
<option value="2010">2010</option>

Answer 1

Do it in two steps: 分两个步骤进行：

make a GET request, parse HTML and extract the form input values 发出GET请求，解析HTML并提取表单输入值
make a POST request parsing input values alongside with ctl00$ctl00$cphContainer$cphContents$ddlYear parameter which is responsible for the year 发出POST请求，将输入值与负责ctl00$ctl00$cphContainer$cphContents$ddlYear参数一起解析

Implementation example for year 2013 (using requests and BeautifulSoup ): 2013年的实现示例（使用requests和BeautifulSoup ）：

from bs4 import BeautifulSoup
import requests

url = 'http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325'

with requests.Session() as session:
    session.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36'}

    # parsing parameters
    response = session.get(url)
    soup = BeautifulSoup(response.content)

    data = {
        'ctl00$ctl00$cphContainer$cphContents$ddlYear': '2013',
        'ctl00$ctl00$txtSearchWord': '',
        '__EVENTTARGET': soup.find('input', {'name': '__EVENTTARGET'}).get('value', ''),
        '__EVENTARGUMENT': soup.find('input', {'name': '__EVENTARGUMENT'}).get('value', ''),
        '__LASTFOCUS': soup.find('input', {'name': '__LASTFOCUS'}).get('value', ''),
        '__VIEWSTATE': soup.find('input', {'name': '__VIEWSTATE'}).get('value', ''),
        '__VIEWSTATEGENERATOR': soup.find('input', {'name': '__VIEWSTATEGENERATOR'}).get('value', ''),
        '__EVENTVALIDATION': soup.find('input', {'name': '__EVENTVALIDATION'}).get('value', ''),
    }

    # parsing data
    response = session.post(url, data=data)

    soup = BeautifulSoup(response.content)

    for row in soup.select('table.tData01 tr'):
        print [td.text for td in row.find_all('td')]

This prints the contents of all stats tables for 2013: 这将打印2013年所有统计信息表的内容：

[u'KIA', u'16', u'0.364', u'55', u'8', u'20', u'3', u'0', u'3', u'11', u'5', u'0', u'14', u'0', u'14', u'1']
[u'LG', u'15', u'0.321', u'53', u'7', u'17', u'1', u'0', u'2', u'9', u'1', u'1', u'6', u'3', u'10', u'2']
[u'NC', u'16', u'0.237', u'59', u'5', u'14', u'2', u'0', u'2', u'10', u'2', u'0', u'3', u'0', u'17', u'2']
[u'SK', u'16', u'0.235', u'51', u'7', u'12', u'1', u'0', u'3', u'13', u'1', u'3', u'13', u'1', u'12', u'4']
[u'\ub450\uc0b0', u'16', u'0.368', u'57', u'16', u'21', u'2', u'1', u'4', u'21', u'2', u'1', u'12', u'0', u'13', u'2']
[u'\ub86f\ub370', u'16', u'0.375', u'56', u'9', u'21', u'4', u'0', u'3', u'13', u'4', u'3', u'11', u'0', u'9', u'3']
[u'\uc0bc\uc131', u'16', u'0.226', u'62', u'8', u'14', u'5', u'0', u'3', u'10', u'0', u'0', u'8', u'1', u'15', u'1']
[u'\ud55c\ud654', u'15', u'0.211', u'57', u'7', u'12', u'3', u'0', u'2', u'9', u'0', u'0', u'1', u'1', u'19', u'3']
...

Answer 2

An example using Mechanize and Ruby. 使用Mechanize和Ruby的示例。 Modify the form field and submit. 修改表单字段并提交。

#!/usr/bin/env ruby

require 'mechanize'

agent = Mechanize.new{ |agent| agent.history.max_size=0 }

agent.user_agent = 'Mozilla/5.0'

url = "http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325"

page = agent.get(url)

form = page.forms[0]

p form['ctl00$ctl00$cphContainer$cphContents$ddlYear']

form['ctl00$ctl00$cphContainer$cphContents$ddlYear'] = 2013

page = form.submit

form = page.forms[0]

p form['ctl00$ctl00$cphContainer$cphContents$ddlYear']

从下拉列表中的选定选项中抓取响应

问题描述

2 个解决方案

解决方案1
4 2015-03-06 05:11:58

解决方案2
0 2015-03-06 05:43:27

从下拉列表中的选定选项中抓取响应

问题描述

2 个解决方案

解决方案1 4 2015-03-06 05:11:58

解决方案2 0 2015-03-06 05:43:27

解决方案1
4 2015-03-06 05:11:58

解决方案2
0 2015-03-06 05:43:27