從下拉列表中的選定選項中抓取響應

Question

這是一個頁面示例，該頁面列出了選定球員的棒球統計信息，默認為最近的年份（2014年，即將到2015年） http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx？ playerId = 76325

下拉列表允許用戶選擇回溯到2010年的年份，但不會更改顯示的url。 如何從下拉列表中的每個值中抓取所有可用的年份？

我目前正在使用Python和BeautifulSoup，但我願意使用任何可以完成工作的方法。

<select name="ctl00$ctl00$cphContainer$cphContents$ddlYear"     
        onchange="javascript:setTimeout(&#39;__doPostBack(\&#39;ctl00$ctl00$cphContainer$cphContents$ddlYear\&#39;,\&#39;\&#39;)&#39;, 0)" 
        id="cphContainer_cphContents_ddlYear" 
        class="select02 mgt30">
<option value="2014">2014</option>
<option value="2013">2013</option>
<option selected="selected" value="2012">2012</option>
<option value="2011">2011</option>
<option value="2010">2010</option>

Answer 1

分兩個步驟進行：

發出GET請求，解析HTML並提取表單輸入值
發出POST請求，將輸入值與負責ctl00$ctl00$cphContainer$cphContents$ddlYear參數一起解析

2013年的實現示例（使用requests和BeautifulSoup ）：

from bs4 import BeautifulSoup
import requests

url = 'http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325'

with requests.Session() as session:
    session.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36'}

    # parsing parameters
    response = session.get(url)
    soup = BeautifulSoup(response.content)

    data = {
        'ctl00$ctl00$cphContainer$cphContents$ddlYear': '2013',
        'ctl00$ctl00$txtSearchWord': '',
        '__EVENTTARGET': soup.find('input', {'name': '__EVENTTARGET'}).get('value', ''),
        '__EVENTARGUMENT': soup.find('input', {'name': '__EVENTARGUMENT'}).get('value', ''),
        '__LASTFOCUS': soup.find('input', {'name': '__LASTFOCUS'}).get('value', ''),
        '__VIEWSTATE': soup.find('input', {'name': '__VIEWSTATE'}).get('value', ''),
        '__VIEWSTATEGENERATOR': soup.find('input', {'name': '__VIEWSTATEGENERATOR'}).get('value', ''),
        '__EVENTVALIDATION': soup.find('input', {'name': '__EVENTVALIDATION'}).get('value', ''),
    }

    # parsing data
    response = session.post(url, data=data)

    soup = BeautifulSoup(response.content)

    for row in soup.select('table.tData01 tr'):
        print [td.text for td in row.find_all('td')]

這將打印2013年所有統計信息表的內容：

[u'KIA', u'16', u'0.364', u'55', u'8', u'20', u'3', u'0', u'3', u'11', u'5', u'0', u'14', u'0', u'14', u'1']
[u'LG', u'15', u'0.321', u'53', u'7', u'17', u'1', u'0', u'2', u'9', u'1', u'1', u'6', u'3', u'10', u'2']
[u'NC', u'16', u'0.237', u'59', u'5', u'14', u'2', u'0', u'2', u'10', u'2', u'0', u'3', u'0', u'17', u'2']
[u'SK', u'16', u'0.235', u'51', u'7', u'12', u'1', u'0', u'3', u'13', u'1', u'3', u'13', u'1', u'12', u'4']
[u'\ub450\uc0b0', u'16', u'0.368', u'57', u'16', u'21', u'2', u'1', u'4', u'21', u'2', u'1', u'12', u'0', u'13', u'2']
[u'\ub86f\ub370', u'16', u'0.375', u'56', u'9', u'21', u'4', u'0', u'3', u'13', u'4', u'3', u'11', u'0', u'9', u'3']
[u'\uc0bc\uc131', u'16', u'0.226', u'62', u'8', u'14', u'5', u'0', u'3', u'10', u'0', u'0', u'8', u'1', u'15', u'1']
[u'\ud55c\ud654', u'15', u'0.211', u'57', u'7', u'12', u'3', u'0', u'2', u'9', u'0', u'0', u'1', u'1', u'19', u'3']
...

Answer 2

使用Mechanize和Ruby的示例。 修改表單字段並提交。

#!/usr/bin/env ruby

require 'mechanize'

agent = Mechanize.new{ |agent| agent.history.max_size=0 }

agent.user_agent = 'Mozilla/5.0'

url = "http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325"

page = agent.get(url)

form = page.forms[0]

p form['ctl00$ctl00$cphContainer$cphContents$ddlYear']

form['ctl00$ctl00$cphContainer$cphContents$ddlYear'] = 2013

page = form.submit

form = page.forms[0]

p form['ctl00$ctl00$cphContainer$cphContents$ddlYear']

從下拉列表中的選定選項中抓取響應

問題描述

2 個解決方案

解決方案1
4 2015-03-06 05:11:58

解決方案2
0 2015-03-06 05:43:27

從下拉列表中的選定選項中抓取響應

問題描述

2 個解決方案

解決方案1 4 2015-03-06 05:11:58

解決方案2 0 2015-03-06 05:43:27

解決方案1
4 2015-03-06 05:11:58

解決方案2
0 2015-03-06 05:43:27