简体   繁体   English

如何使用 Python 从 .aspx 页面检索数据?

[英]How to retrieve data from .aspx page using Python?

I am looking to access data from a .aspx website with a number of fields where parameters need to be entered.我希望从一个 .aspx 网站访问数据,其中包含许多需要输入参数的字段。 The data will be further analyzed in Pandas.数据将在 Pandas 中进一步分析。 I'm obviously missing some steps here, so any help would be appreciated.我显然在这里遗漏了一些步骤,所以任何帮助将不胜感激。 The website is https://www.cocorahs.org/ViewData/StationPrecipSummary.aspx该网站是https://www.cocorahs.org/ViewData/StationPrecipSummary.aspx

I'm trying a simple method using the Python library Requests, getting the json, and converting to a DataFrame.我正在尝试使用 Python 库请求、获取 json 并转换为 DataFrame 的简单方法。

parameters = {'Station 1':'MD-BL-13','Start Date':'8/01/2019','End Date':'08/10/2017'}
response = requests.get('https://www.cocorahs.org/ViewData/StationPrecipSummary.aspx', params=parameters)
data = response.json()
pd.read_json(data)

I would like to get a DataFrame with columns 'Date' and 'Precip mm' with data from the time period requested.我想获得一个 DataFrame,其中包含“Date”和“Precip mm”列,其中包含请求的时间段内的数据。 A check of response.content shows that the parameters are not correctly taken, as only the content of the web page before a query has been entered appears.对 response.content 的检查表明参数未正确采用,因为只显示输入查询之前的网页内容。

I find ASP.NET sites to be a pain in the ass to deal with, but here's a solution with pandas and requests-html.我发现 ASP.NET 站点处理起来很麻烦,但这里有一个使用 pandas 和 requests-html 的解决方案。

from requests_html import HTMLSession
import pandas as pd


with HTMLSession() as s:

    r = s.get('https://www.cocorahs.org/ViewData/StationPrecipSummary.aspx')
    hiddens = r.html.find('input[name=__VIEWSTATE]', first=True).attrs.get('value')

    payload = {
        '__EVENTTARGET': '',
        '_VIEWSTATE': hiddens,
        'obsSwitcher:ddlObsUnits': 'usunits',
        'tbStation1': 'MD-BL-13',
        'ucDateRangeFilter:dcStartDate': '8/1/2019',
        'ucDateRangeFilter_dcStartDate_p': '2019-8-1-0-0-0-0',
        'ucDateRangeFilter:dcEndDate': '8/10/2019',
        'ucDateRangeFilter_dcEndDate_p': '2019-8-10-0-0-0-0',
        'btnSubmit': 'Get Summary'
        }


    r = s.post('https://www.cocorahs.org/ViewData/StationPrecipSummary.aspx', data=payload)
    table = r.html.find('table.Grid', first=True)
    df = pd.read_html(table.html, header=0)[0]
    print(df)


          Date Precip in.
0   08/01/2019       0.00
1   08/02/2019       0.00
2   08/03/2019       0.00
3   08/04/2019       0.00
4   08/05/2019       0.00
5   08/06/2019       0.00
6   08/07/2019          T
7   08/08/2019       1.73
8   08/09/2019         --
9   08/10/2019         --
10    Totals :   1.73 in.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM