简体   繁体   中英

can't download cookie protected file with python

All day long I'm looking to solve this. There is this http://www.some.site/index.php that is requesting user and password + sends cookie. Alright, I get in like this:

import urllib, urllib2, cookielib, os
import re # not required here but tried it out though
import requests # not required here but tried it out though
username = 'somebody'
password = 'somepass'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
resp = opener.open('http://www.some.site/index.php', login_data)
print resp.read()

The problem is that in the middle of the screen there is a link to download an .xls file: http://www.some.site/excel_file.php?/t=1303457489 . I can download the file in any Browser (Mozilla, Chrome, IE) but not with Python. After the .php the post data ( ie: ?t=1370919996 ) is changing all the time when I login or Refresh the page.

Maybe I'm wrong but I believe the Post Data is generated from the cookie (or session-cookie), but the cookie contains only this: ('set-cookie', 'PHPSESSID=9cde55534fcc8e136fcf6588c0d0f1df; path=/')

This is one way I tried to save the file:

print "downloading with urllib2"
f = urllib2.urlopen('http://www.some.site/excel_file.php')
data = f.read()
with open("exceldoc.xls", "wb") as code:
    code.write(data)

if I save it or if I print it out gives the same bad request error:

<b>Fatal error</b>:  Call to a member function FetchRow() on a non-object in <b>http://www.some.site/excel_file.php</b> on line <b>112</b><br 

How can I download this file with Python? Thank you so much in advance for any help!

There are many similar posts, I've checked them and my examples are inspired from those yet nothing worked for me. I'm not very familiar with cookies , php, js.

EDIT: this is what I get when I print out the content of index.php:

<html>
<head>
<title>SOMETITLE</title>
<meta http-equiv="Page-Enter" content="blendTrans(Duration=0.5)">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<link rel='stylesheet' type='text/css' href='somesite.css'>
<SCRIPT LANGUAGE="JavaScript">
<!-- JavaScript hiding

function clearDefault(obj) {
    if (!obj._cleared) {
                obj.value='';
                obj._cleared=true;
    }
}

// -->
</SCRIPT>
</head>

<body bgcolor="#FFFFFF" text="#000000">

<table width="100%" border="0" align="center" cellpadding="0" cellspacing="0">
  <tr>
    <td>
      <table width="1000" height="150" border="0" align="center" cellpadding="16" cellspacing="0" class="header" style="background: #989896 url('images/header.png') no-repeat;">
        <tr>
          <td valign="middle">
            <table width="100%" border="0" align="center" cellpadding="0" cellspacing="0">
              <tr>
                <td width="380">&nbsp;</td>
                <td>
                  <div id="login">
                       <form name="flogin" method="post" action="/index.php">
                      <h1>Login</h1>
                      <input name="uName" type="text" value="Username:" class="name" onfocus="clearDefault(this)">
                      <br>
                      <input type="password" name="uPw"  value="Password:" class="pass" onfocus="clearDefault(this)">
                      <input type="submit" name="Submit" value="OK" class="submit">
                    </form>
                  </div>                                                                
                                                                </td>
              </tr>
            </table>
          </td>
        </tr>
      </table>
                </td>
  </tr>
</table>

</body>
</html>

You could try to parse the response from the first code section and use the extracted url with the same opener . Without knowing the actual format of the link:

import urllib, urllib2, cookielib, os
import re # going to use this now!

username = 'somebody'
password = 'somepass'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
resp = opener.open('http://www.some.site/index.php', login_data)
content = resp.read()
print content

match = re.search(
    r"<a\s+href=\"(?P<file_link>http://www.some.site/excel_file.php?t=\d+)\">",
    content,
    re.IGNORECASE
)

assert match is not None, "Couldn't find the file link..."

file_link = match.group('file_link')
print "downloading {} with urllib2".format(file_link)
f = opener.open(file_link)
data = f.read()
with open("exceldoc.xls", "wb") as code:
    code.write(data)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM