简体   繁体   中英

Sending requests to webpages using Python's urllib2

I am interested in using Python to automate certain tasks. Specifically, I would like to use Python to interact with website to perform tasks such as getting specific information from a page, make request (POST data and read the response), and downloading and uploading files. So far, I have only been able to use Python to get the HTML from a page using urllib2. The next thing I tried was sending a request to a page; I made several attempts, but they all failed.

    >>> import urllib2
    >>> import urllib
    >>> url = "http://www.stackoverflow.com/"
    >>> values = {}
    >>> values["input"] = "foo"
    >>> data = urllib.urlencode(values)
    >>> request = urllib2.Request(url + "search/", data)
    >>> response = urllib2.urlopen(request)
    >>> html = response.read()
    >>> print html

The way I understand things so far is that I need to create a dictionary with the names of the fields and the input and encode it with urllib.urllencode(values). Then I need to make a request with urllib2.Request(theUrlReceivingTheRequest, data, headers) which, if only given a url will only GET, but, if given data, will POST, and can be given headers that can disguise the program as a common browser such as Firefox or IE. I then get a response with urllib2.urlopen(request) which returns a file like object which, consequently, I can read(). As I understand it, I can also use urllib2.build_opener() which can receive handlers (that can process cookies, redirrections, authentication etc) and add headers using .addheaders("User-Agent", ""). I'd like to be able to eventually do (and understand) all of these things, but, first, I'd just like to get a form submitted. In the above code from my interactive session with Python, did I follow the correct procedure? (I was attempting to input a search for "foo" in the search field on the front page of stackoverflow.)

Your life will be easier if you use requests instead of urllib2. Here is your example with the requests API:

import requests
r=requests.post("http://www.stackoverflow.com/search/",data={'input':'foo'})
print r.text

If you just want to get the search result using GET method, you can inspect the html code of the FORM: <form id="search" action="/search" method="get" autocomplete="off"> <div> <input autocomplete="off" name="q" class="textbox" placeholder="search" tabindex="1" type="text" maxlength="140" size="28" value="foo" style="width: 200px; max-width: 200px; "> </div> </form>

The action is "/search" the input name is "q", so the request url will be https://stackoverflow.com/search?q=foo

So just use urllib2 to open the url above will work.

You don't need to worry about the request headers like "User Agent" as urllib2 will add that for you, however, you can set it explicitly .

To make it work, you need to change the "input" to "q" and don't use the "data" parameter in the request, otherwise it will use POST other than GET, the program will be:

import urllib2
import urllib
url = "http://www.stackoverflow.com/"
values = {}
values["q"] = "foo"
data = urllib.urlencode(values)
request = urllib2.Request(url + "search" +"?"+ data)
response = urllib2.urlopen(request)
html = response.read()
print html

enter code here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM