简体   繁体   中英

Python3 requests library to submit form that disallows post request

I am trying to get the police district from a given location at the Philly Police webpage . I too many locations to do this by hand, so I am trying to automate the process using Python's requests library. The webpage's form that holds the location value is as follows:

<form id="search-form" method="post" action="districts/searchAddress">
<fieldset>
    <div class="clearfix">
        <label for="search-address-box"><span>Enter Your Street Address</span></label>
        <div class="input">
            <input tabindex="1" class="district-street-address-input" id="search-address-box" name="name" type="text" value="">
        </div>
    </div>
    <div class="actions" style="float: left;">
        <button tabindex="3" type="submit" class="btn btn-success">Search</button>
    </div>
    <a id="use-location" href="https://www.phillypolice.com/districts/index.html?_ID=7&_ClassName=DistrictsHomePage#" style="float: left; margin: 7px 0 0 12px;"><i class="icon-location-arrow"></i>Use Current Location</a>
    <div id="current-location-display" style="display: none;"><p>Where I am right now.</p></div>
</fieldset>
</form>

However when I try to post or put to the webpage using the following:

r = requests.post('http://www.phillypolice.com/districts',data={'search-address-box':'425 E. Roosevelt Blvd'})

I get error 405, POST is not allowed. I then turned off Javascript and tried to find the district on the webpage, and when I hit submit I received the same 405 error message. Therefore the form is definitely not submitted and the district is found using JavaScript.

Is there a way to simulate 'clicking' the submit button to trigger the JavaScript using the requests library?

The data is retrieved after first querying google maps to the the coordinates where the final request is a get like the following:

在此处输入图片说明

You can setup a free account with the bing maps api and get the coords you need to make the get request:

import requests

key = "my_key"
coord_params = {"output": "json",
                "key": key}

# This provides the coordinates.
coords_url = "https://dev.virtualearth.net/REST/v1/Locations"

# Template to pass each address to in your actual loop.
template = "{add},US"
url = "https://api.phillypolice.com/jsonservice/Map/searchAddress.json"
with requests.Session() as s:
    # Add the query param passing in each zipcode
    coord_params["query"] = template.format(add="425 E. Roosevelt Blvd")
    js = s.get(coords_url, params=coord_params).json()
    # Parse latitude and longitude from the returned json.
    # Call str to make make it into `(lat, lon)`
    latitude_longitude = str((js[u'resourceSets'][0][u'resources'][0]["point"][u'coordinates']))
    data = s.get(url, params={"latlng": latitude_longitude})

    print(data.json())

If we run it minus my key:

In [2]: import requests
   ...: 
   ...: key = "my_key..."
   ...: 
   ...: coord_params = {"output": "json",
   ...:                 "key": key}
   ...: coords_url = "https://dev.virtualearth.net/REST/v1/Locations"
   ...: template = "{add},US"
   ...: url = "https://api.phillypolice.com/jsonservice/Map/searchAddress.json"
   ...: with requests.Session() as s:
   ...:     coord_params["query"] = template.format(add="425 E. Roosevelt Blvd")
   ...: 
   ...:     js = s.get(coords_url, params=coord_params).json()
   ...:     latitude_longitude = str(js[u'resourceSets'][0][u'resources'][0]["po
   ...: int"][u'coordinates'])
   ...:     print(latitude_longitude)
   ...:     data = s.get(url, params={"latlng": latitude_longitude})
   ...:     print(data.json())
   ...:     
[40.02735900878906, -75.1153564453125]
{'response': ['35', '2', 'Marques Newsome', 'PPD.35_PSA2@phila.gov ', '267-357-1436']}

You can see it matches the response you see if you look at the request in your browser.

There are 2 major things happening when you hit "submit" - there is a request to the google geocode service and an XHR request to the "searchAddress.json" endpoint that uses the coordinates returned by the geocode service.

You can try to simulate the above requests carefully handling all the API keys and required parameters, or you can stay on a higher level and use browser automation via selenium .

Working example using PhantomJS headless browser :

In [2]: from selenium import webdriver

In [3]: driver = webdriver.PhantomJS()

In [4]: driver.get("https://www.phillypolice.com/districts/")

In [5]: address = "425 E. Roosevelt Blvd"

In [6]: search_box = driver.find_element_by_id("search-address-box")

In [7]: search_box.send_keys(address)

In [8]: search_box.submit()

In [9]: driver.find_element_by_css_selector("#district-menu h2").text
Out[9]: u'35th District'

In [10]: driver.find_element_by_css_selector("#district-menu h4").text
Out[10]: u'PSA 2'

And, you might need the Explicit Waits to handle the "timing" issues.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM