简体   繁体   English

Python3请求库提交不允许发布请求的表单

[英]Python3 requests library to submit form that disallows post request

I am trying to get the police district from a given location at the Philly Police webpage . 我试图从Philly Police网页上的指定位置进入警察局 I too many locations to do this by hand, so I am trying to automate the process using Python's requests library. 我有太多位置无法手动执行此操作,因此我尝试使用Python的请求库自动执行该过程。 The webpage's form that holds the location value is as follows: 包含位置值的网页形式如下:

<form id="search-form" method="post" action="districts/searchAddress">
<fieldset>
    <div class="clearfix">
        <label for="search-address-box"><span>Enter Your Street Address</span></label>
        <div class="input">
            <input tabindex="1" class="district-street-address-input" id="search-address-box" name="name" type="text" value="">
        </div>
    </div>
    <div class="actions" style="float: left;">
        <button tabindex="3" type="submit" class="btn btn-success">Search</button>
    </div>
    <a id="use-location" href="https://www.phillypolice.com/districts/index.html?_ID=7&_ClassName=DistrictsHomePage#" style="float: left; margin: 7px 0 0 12px;"><i class="icon-location-arrow"></i>Use Current Location</a>
    <div id="current-location-display" style="display: none;"><p>Where I am right now.</p></div>
</fieldset>
</form>

However when I try to post or put to the webpage using the following: 但是,当我尝试使用以下方法发布或放置到网页上时:

r = requests.post('http://www.phillypolice.com/districts',data={'search-address-box':'425 E. Roosevelt Blvd'})

I get error 405, POST is not allowed. 我收到错误405,不允许POST。 I then turned off Javascript and tried to find the district on the webpage, and when I hit submit I received the same 405 error message. 然后,我关闭了Javascript并尝试在网页上找到地区,当我点击Submit时,我收到了相同的405错误消息。 Therefore the form is definitely not submitted and the district is found using JavaScript. 因此,绝对不提交表单,并且使用JavaScript找到了该地区。

Is there a way to simulate 'clicking' the submit button to trigger the JavaScript using the requests library? 有没有一种方法可以模拟“单击”提交按钮以使用请求库触发JavaScript?

The data is retrieved after first querying google maps to the the coordinates where the final request is a get like the following: 在首先查询google maps到最终请求得到的坐标后,将检索数据,如下所示:

在此处输入图片说明

You can setup a free account with the bing maps api and get the coords you need to make the get request: 您可以使用bing maps api设置一个免费帐户,并获取发出get请求所需的坐标:

import requests

key = "my_key"
coord_params = {"output": "json",
                "key": key}

# This provides the coordinates.
coords_url = "https://dev.virtualearth.net/REST/v1/Locations"

# Template to pass each address to in your actual loop.
template = "{add},US"
url = "https://api.phillypolice.com/jsonservice/Map/searchAddress.json"
with requests.Session() as s:
    # Add the query param passing in each zipcode
    coord_params["query"] = template.format(add="425 E. Roosevelt Blvd")
    js = s.get(coords_url, params=coord_params).json()
    # Parse latitude and longitude from the returned json.
    # Call str to make make it into `(lat, lon)`
    latitude_longitude = str((js[u'resourceSets'][0][u'resources'][0]["point"][u'coordinates']))
    data = s.get(url, params={"latlng": latitude_longitude})

    print(data.json())

If we run it minus my key: 如果我们运行它减去我的密钥:

In [2]: import requests
   ...: 
   ...: key = "my_key..."
   ...: 
   ...: coord_params = {"output": "json",
   ...:                 "key": key}
   ...: coords_url = "https://dev.virtualearth.net/REST/v1/Locations"
   ...: template = "{add},US"
   ...: url = "https://api.phillypolice.com/jsonservice/Map/searchAddress.json"
   ...: with requests.Session() as s:
   ...:     coord_params["query"] = template.format(add="425 E. Roosevelt Blvd")
   ...: 
   ...:     js = s.get(coords_url, params=coord_params).json()
   ...:     latitude_longitude = str(js[u'resourceSets'][0][u'resources'][0]["po
   ...: int"][u'coordinates'])
   ...:     print(latitude_longitude)
   ...:     data = s.get(url, params={"latlng": latitude_longitude})
   ...:     print(data.json())
   ...:     
[40.02735900878906, -75.1153564453125]
{'response': ['35', '2', 'Marques Newsome', 'PPD.35_PSA2@phila.gov ', '267-357-1436']}

You can see it matches the response you see if you look at the request in your browser. 如果您在浏览器中查看请求,则可以看到它与响应相匹配。

There are 2 major things happening when you hit "submit" - there is a request to the google geocode service and an XHR request to the "searchAddress.json" endpoint that uses the coordinates returned by the geocode service. 当您点击“提交”时,会发生2件主要事情-向Google地理编码服务发送一个请求,向“ searchAddress.json”端点发送XHR请求,该请求使用地理编码服务返回的坐标。

You can try to simulate the above requests carefully handling all the API keys and required parameters, or you can stay on a higher level and use browser automation via selenium . 您可以尝试仔细处理所有API密钥和必需的参数来模拟上述请求,或者可以保持较高的级别并通过selenium使用浏览器自动化。

Working example using PhantomJS headless browser : 使用PhantomJS无头浏览器的工作示例:

In [2]: from selenium import webdriver

In [3]: driver = webdriver.PhantomJS()

In [4]: driver.get("https://www.phillypolice.com/districts/")

In [5]: address = "425 E. Roosevelt Blvd"

In [6]: search_box = driver.find_element_by_id("search-address-box")

In [7]: search_box.send_keys(address)

In [8]: search_box.submit()

In [9]: driver.find_element_by_css_selector("#district-menu h2").text
Out[9]: u'35th District'

In [10]: driver.find_element_by_css_selector("#district-menu h4").text
Out[10]: u'PSA 2'

And, you might need the Explicit Waits to handle the "timing" issues. 并且,您可能需要“ 显式等待”来处理“计时”问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM