import urllib.request
url="http://espn.com"
f = urllib.request.urlopen(url)
contents = f.read().decode('latin-1')
q = f.geturl()
print(q)
This code will return http://espn.go.com/
, which is what I want -- a redirect web site URL. After looking at the Python documentation, googling, etc., I can't figure out how to also:
How can I do this in Python 3? If there is a better module than urllib
, I am OK with this.
There is a better module, it's called requests
:
import requests
session = requests.Session()
session.headers['User-Agent'] = 'My-requests-agent/0.1'
resp = session.get(url)
contents = resp.text # If the server said it's latin 1, this'll be unicode (ready decoded)
print(resp.url) # final URL, after redirects.
requests
follows redirects (check resp.history
to see what redirects it followed). By using a session (optional), cookies are stored and passed on to subsequent requests. You can set headers per request or per session (so the same extra headers will be sent with every request sent out for that session).
A simple demo using urllib (python3):
#!/usr/bin/env python3
#-*- coding:utf-8 -*-
import os.path
import urllib.request
from urllib.parse import urlencode
from http.cookiejar import CookieJar,MozillaCookieJar
cj = MozillaCookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
urllib.request.install_opener(opener)
cookie_file=os.path.abspath('./cookies.txt')
def load_cookies(cj,cookie_file):
cj.load(cookie_file)
def save_cookies(cj,cookie_file):
cj.save(cookie_file,ignore_discard=True,ignore_expires=True)
def dorequest(url,cj=None,data=None,timeout=10,encoding='UTF-8'):
data = urlencode(data).encode(encoding) if data else None
request = urllib.request.Request(url)
request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)')
f = urllib.request.urlopen(request,data,timeout=timeout)
return f.read()
def dopost(url,cj=None,data=None,timeout=10,encoding='UTF-8'):
body = dorequest(url,cj,data,timeout,encoding)
return body.decode(encoding)
You should check the headers if redirecting happening(30x).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.