how can set headers (user-agent), retrieve a web page, capture redirects and accept cookies?

Question

import urllib.request
url="http://espn.com"
f = urllib.request.urlopen(url)
contents = f.read().decode('latin-1')
q = f.geturl()
print(q)

This code will return http://espn.go.com/ , which is what I want -- a redirect web site URL. After looking at the Python documentation, googling, etc., I can't figure out how to also:

Capture redirected web site URL (already working)
Change the user-agent on the out going request
Accept any cookies the web page might want to send back

How can I do this in Python 3? If there is a better module than urllib , I am OK with this.

Answer 1

There is a better module, it's called requests :

import requests

session = requests.Session()
session.headers['User-Agent'] = 'My-requests-agent/0.1'

resp = session.get(url)
contents = resp.text  # If the server said it's latin 1, this'll be unicode (ready decoded)
print(resp.url)       # final URL, after redirects.

requests follows redirects (check resp.history to see what redirects it followed). By using a session (optional), cookies are stored and passed on to subsequent requests. You can set headers per request or per session (so the same extra headers will be sent with every request sent out for that session).

Answer 2

A simple demo using urllib (python3):

#!/usr/bin/env python3
#-*- coding:utf-8 -*-

import os.path
import urllib.request
from urllib.parse import urlencode
from http.cookiejar import CookieJar,MozillaCookieJar

cj = MozillaCookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
urllib.request.install_opener(opener)

cookie_file=os.path.abspath('./cookies.txt')

def load_cookies(cj,cookie_file):
    cj.load(cookie_file)
def save_cookies(cj,cookie_file):
    cj.save(cookie_file,ignore_discard=True,ignore_expires=True)

def dorequest(url,cj=None,data=None,timeout=10,encoding='UTF-8'):
    data = urlencode(data).encode(encoding) if data else None

    request = urllib.request.Request(url)
    request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)')
    f = urllib.request.urlopen(request,data,timeout=timeout)
    return f.read()

def dopost(url,cj=None,data=None,timeout=10,encoding='UTF-8'):
    body = dorequest(url,cj,data,timeout,encoding)
    return body.decode(encoding)

You should check the headers if redirecting happening(30x).

how can set headers (user-agent), retrieve a web page, capture redirects and accept cookies?

Question

2 answers

solution1
8 ACCPTED 2013-01-04 21:17:38

solution2
7 2013-01-05 09:35:02

how can set headers (user-agent), retrieve a web page, capture redirects and accept cookies?

Question

2 answers

solution1 8 ACCPTED 2013-01-04 21:17:38

solution2 7 2013-01-05 09:35:02

solution1
8 ACCPTED 2013-01-04 21:17:38

solution2
7 2013-01-05 09:35:02