Getting a file from an authenticated site (with python urllib, urllib2)

Question

I'm trying to get a queried-excel file from a site. When I enter the direct link, it will lead to a login page and once I've entered my username and password, it will proceed to download the excel file automatically. I am trying to avoid installing additional module that's not part of the standard python (This script will be running on a "standardize machine" and it won't work if the module is not installed)

I've tried the following but I see a "page login" information in the excel file itself :-|

import urllib

url = "myLink_queriedResult/result.xls"
urllib.urlretrieve(url,"C:\\test.xls")

SO.. then I looked into using urllib2 with password authentication but then I'm stuck.

I have the following code:

import urllib2
import urllib

theurl = 'myLink_queriedResult/result.xls'
username = 'myName'
password = 'myPassword'

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, theurl, username, password)

authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
urllib2.install_opener(opener)
pagehandle = urllib2.urlopen(theurl)
pagehandle.read()  ##but seems like it still only contain a 'login page'

Appreciate any advice in advance. :)

Answer 1

Urllib is generally eschewed these days for Requests .

This would do what you want:

import requests
from requests.auth import HTTPBasicAuth

theurl= 'myLink_queriedResult/result.xls'
username = 'myUsername'
password = 'myPassword'

r=requests.get(theurl, auth=HTTPBasicAuth(username, password))

Here you can find more information on authentication using request.

Answer 2

You may try through this way with Python 3,

    import requests
    #import necessary Authentication Method 
    from requests_ntlm import HttpNtlmAuth
    from xlrd import open_workbook
    import pandas as pd
    from io import BytesIO
    r = requests.get("http://example.website",auth=HttpNtlmAuth('acc','password'))
    xd = pd.read_excel(BytesIO(r.content))

Ref:

Answer 3

You will need to use cookies to allow authentication. `

# check the input name for login information by inspecting source
values ={'username' : username, 'password':password}
data = urllib.parse.urlencode(values).encode("utf-8")
cookies = cookielib.CookieJar()  

# create "opener" (OpenerDirector instance)
    opener = urllib.request.build_opener(
        urllib.request.HTTPRedirectHandler(),
        urllib.request.HTTPHandler(debuglevel=0),
        urllib.request.HTTPSHandler(debuglevel=0),
        urllib.request.HTTPCookieProcessor(cookies))

# use the opener to fetch a URL
    response = opener.open(the_url,data)

# Install the opener.
# Now all calls to urllib.request.urlopen use our opener.
    urllib.request.install_opener(opener)`

Answer 4

You can use requests.get to download file. Try the sample code:

import requests
from requests.auth import HTTPBasicAuth

def download_file(user_name, user_pwd, url, file_path):
    file_name = url.rsplit('/', 1)[-1]
    with requests.get(url, stream = True, auth = HTTPBasicAuth(user_name, user_pwd)) as response:
        with open(file_path + "/" + file_name, 'wb') as f:
            for chunk in response.iter_content(chunk_size = 8192):
                f.write(chunk)

# You will download the login.html file to /home/dan/
download_file("dan", "password", "http://www.example.com/login.html", "/home/dan/")

Enjoy it!!

Getting a file from an authenticated site (with python urllib, urllib2)

Question

4 answers

solution1
2 2014-07-18 23:23:39

solution2
1 2018-01-09 01:08:19

solution3
0 2017-09-06 15:02:02

solution4
0 2021-03-25 09:18:06

Getting a file from an authenticated site (with python urllib, urllib2)

Question

4 answers

solution1 2 2014-07-18 23:23:39

solution2 1 2018-01-09 01:08:19

solution3 0 2017-09-06 15:02:02

solution4 0 2021-03-25 09:18:06

solution1
2 2014-07-18 23:23:39

solution2
1 2018-01-09 01:08:19

solution3
0 2017-09-06 15:02:02

solution4
0 2021-03-25 09:18:06