简体   繁体   English

从下拉列表中的选定选项中抓取响应

[英]scraping a response from a selected option in dropdown list

This is an example of a page that lists baseball stats for a selected player, defaulting to the most recent year (2014, soon to be 2015) http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325 这是一个页面示例,该页面列出了选定球员的棒球统计信息,默认为最近的年份(2014年,即将到2015年) http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx? playerId = 76325

The drop down list allows the user to selected years back to 2010, but doesn't not change the displayed url. 下拉列表允许用户选择回溯到2010年的年份,但不会更改显示的url。 How can I scrape all the available years, from each value in the drop down list? 如何从下拉列表中的每个值中抓取所有可用的年份?

I'm currently using Python and BeautifulSoup, but I'm willing to use whatever will get the job done. 我目前正在使用Python和BeautifulSoup,但我愿意使用任何可以完成工作的方法。

<select name="ctl00$ctl00$cphContainer$cphContents$ddlYear"     
        onchange="javascript:setTimeout(&#39;__doPostBack(\&#39;ctl00$ctl00$cphContainer$cphContents$ddlYear\&#39;,\&#39;\&#39;)&#39;, 0)" 
        id="cphContainer_cphContents_ddlYear" 
        class="select02 mgt30">
<option value="2014">2014</option>
<option value="2013">2013</option>
<option selected="selected" value="2012">2012</option>
<option value="2011">2011</option>
<option value="2010">2010</option>

Do it in two steps: 分两个步骤进行:

  • make a GET request, parse HTML and extract the form input values 发出GET请求,解析HTML并提取表单输入值
  • make a POST request parsing input values alongside with ctl00$ctl00$cphContainer$cphContents$ddlYear parameter which is responsible for the year 发出POST请求,将输入值与负责ctl00$ctl00$cphContainer$cphContents$ddlYear参数一起解析

Implementation example for year 2013 (using requests and BeautifulSoup ): 2013年的实现示例(使用requestsBeautifulSoup ):

from bs4 import BeautifulSoup
import requests

url = 'http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325'

with requests.Session() as session:
    session.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36'}

    # parsing parameters
    response = session.get(url)
    soup = BeautifulSoup(response.content)

    data = {
        'ctl00$ctl00$cphContainer$cphContents$ddlYear': '2013',
        'ctl00$ctl00$txtSearchWord': '',
        '__EVENTTARGET': soup.find('input', {'name': '__EVENTTARGET'}).get('value', ''),
        '__EVENTARGUMENT': soup.find('input', {'name': '__EVENTARGUMENT'}).get('value', ''),
        '__LASTFOCUS': soup.find('input', {'name': '__LASTFOCUS'}).get('value', ''),
        '__VIEWSTATE': soup.find('input', {'name': '__VIEWSTATE'}).get('value', ''),
        '__VIEWSTATEGENERATOR': soup.find('input', {'name': '__VIEWSTATEGENERATOR'}).get('value', ''),
        '__EVENTVALIDATION': soup.find('input', {'name': '__EVENTVALIDATION'}).get('value', ''),
    }

    # parsing data
    response = session.post(url, data=data)

    soup = BeautifulSoup(response.content)

    for row in soup.select('table.tData01 tr'):
        print [td.text for td in row.find_all('td')]

This prints the contents of all stats tables for 2013: 这将打印2013年所有统计信息表的内容:

[u'KIA', u'16', u'0.364', u'55', u'8', u'20', u'3', u'0', u'3', u'11', u'5', u'0', u'14', u'0', u'14', u'1']
[u'LG', u'15', u'0.321', u'53', u'7', u'17', u'1', u'0', u'2', u'9', u'1', u'1', u'6', u'3', u'10', u'2']
[u'NC', u'16', u'0.237', u'59', u'5', u'14', u'2', u'0', u'2', u'10', u'2', u'0', u'3', u'0', u'17', u'2']
[u'SK', u'16', u'0.235', u'51', u'7', u'12', u'1', u'0', u'3', u'13', u'1', u'3', u'13', u'1', u'12', u'4']
[u'\ub450\uc0b0', u'16', u'0.368', u'57', u'16', u'21', u'2', u'1', u'4', u'21', u'2', u'1', u'12', u'0', u'13', u'2']
[u'\ub86f\ub370', u'16', u'0.375', u'56', u'9', u'21', u'4', u'0', u'3', u'13', u'4', u'3', u'11', u'0', u'9', u'3']
[u'\uc0bc\uc131', u'16', u'0.226', u'62', u'8', u'14', u'5', u'0', u'3', u'10', u'0', u'0', u'8', u'1', u'15', u'1']
[u'\ud55c\ud654', u'15', u'0.211', u'57', u'7', u'12', u'3', u'0', u'2', u'9', u'0', u'0', u'1', u'1', u'19', u'3']
...

An example using Mechanize and Ruby. 使用Mechanize和Ruby的示例。 Modify the form field and submit. 修改表单字段并提交。

#!/usr/bin/env ruby

require 'mechanize'

agent = Mechanize.new{ |agent| agent.history.max_size=0 }

agent.user_agent = 'Mozilla/5.0'

url = "http://www.koreabaseball.com/Record/Player/HitterDetail/Game.aspx?playerId=76325"

page = agent.get(url)

form = page.forms[0]

p form['ctl00$ctl00$cphContainer$cphContents$ddlYear']

form['ctl00$ctl00$cphContainer$cphContents$ddlYear'] = 2013

page = form.submit

form = page.forms[0]

p form['ctl00$ctl00$cphContainer$cphContents$ddlYear']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在下拉列表中的每个选项内抓取迭代选择的 html 数据项 - Scraping iteratively selected html data items inside each option in dropdown list 从下拉选项值中抓取 Python BeautifulSoup - Scraping from dropdown option value Python BeautifulSoup 从下拉菜单中刮取所选值 - Scraping the selected value from a dropdown menu 如何使用下拉列表启动 kivy 应用程序,该应用程序选择我将处理的数据,具体取决于从下拉列表中选择的选项? - How to start a kivy app with a dropdown list that selects the data i will be working on, depending the selected option from te dropdown? 从下拉列表中的选定选项中刮取文本 - Scrape text from selected option from dropdown 从动态相关下拉列表中获取所选选项 - 使用 Flask (Python) - Get selected option from dynamic dependent dropdown list - using Flask (Python) 我如何在python中使用beauttifulsoup在下拉列表中设置一个选项被“选中”? - How i set an option in dropdown list is 'selected' using beauttifulsoup in python? 使用滚动从下拉列表中刮取 - Scraping from dropdown with scroll Python&gt;bs4 根据下拉列表中的选择抓取网站 - Python>bs4 Scraping website based on choice from dropdown list Selenium python尝试从下拉列表中提取第一个选定的选项值 - Selenium python trying to extract the first selected option value from the dropdown
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM