繁体   English   中英

无法使用python下载受cookie保护的文件

[英]can't download cookie protected file with python

我整天都在寻找解决办法。 有这个http://www.some.site/index.php要求用户和密码+发送cookie。 好吧,我这样进入:

import urllib, urllib2, cookielib, os
import re # not required here but tried it out though
import requests # not required here but tried it out though
username = 'somebody'
password = 'somepass'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
resp = opener.open('http://www.some.site/index.php', login_data)
print resp.read()

问题在于,屏幕中间有一个下载.xls文件的链接: http : //www.some.site/excel_file.php ?/t=1303457489。 我可以在任何浏览器(Mozilla,Chrome,IE)中下载文件,但不能使用Python下载文件。 在.php之后,当我登录或刷新页面时, 帖子数据 (即?t = 1370919996)一直在变化。

也许我错了,但我相信发布数据是从cookie(或会话cookie)生成的,但是cookie只包含以下内容:( ('set-cookie', 'PHPSESSID=9cde55534fcc8e136fcf6588c0d0f1df; path=/')

这是我尝试保存文件的一种方法:

print "downloading with urllib2"
f = urllib2.urlopen('http://www.some.site/excel_file.php')
data = f.read()
with open("exceldoc.xls", "wb") as code:
    code.write(data)

如果我保存它或将其打印出来,则会出现同样的错误请求错误:

<b>Fatal error</b>:  Call to a member function FetchRow() on a non-object in <b>http://www.some.site/excel_file.php</b> on line <b>112</b><br 

如何使用Python下载此文件? 提前非常感谢您的帮助!

有很多类似的帖子,我已经检查了它们,我的示例是从这些示例中汲取灵感的,但对我没有用。 我对cookie ,php,js不太熟悉。

编辑:这是我打印出index.php的内容时得到的:

<html>
<head>
<title>SOMETITLE</title>
<meta http-equiv="Page-Enter" content="blendTrans(Duration=0.5)">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<link rel='stylesheet' type='text/css' href='somesite.css'>
<SCRIPT LANGUAGE="JavaScript">
<!-- JavaScript hiding

function clearDefault(obj) {
    if (!obj._cleared) {
                obj.value='';
                obj._cleared=true;
    }
}

// -->
</SCRIPT>
</head>

<body bgcolor="#FFFFFF" text="#000000">

<table width="100%" border="0" align="center" cellpadding="0" cellspacing="0">
  <tr>
    <td>
      <table width="1000" height="150" border="0" align="center" cellpadding="16" cellspacing="0" class="header" style="background: #989896 url('images/header.png') no-repeat;">
        <tr>
          <td valign="middle">
            <table width="100%" border="0" align="center" cellpadding="0" cellspacing="0">
              <tr>
                <td width="380">&nbsp;</td>
                <td>
                  <div id="login">
                       <form name="flogin" method="post" action="/index.php">
                      <h1>Login</h1>
                      <input name="uName" type="text" value="Username:" class="name" onfocus="clearDefault(this)">
                      <br>
                      <input type="password" name="uPw"  value="Password:" class="pass" onfocus="clearDefault(this)">
                      <input type="submit" name="Submit" value="OK" class="submit">
                    </form>
                  </div>                                                                
                                                                </td>
              </tr>
            </table>
          </td>
        </tr>
      </table>
                </td>
  </tr>
</table>

</body>
</html>

您可以尝试解析第一个代码部分的响应,并将提取的url与相同的opener 在不知道链接的实际格式的情况下:

import urllib, urllib2, cookielib, os
import re # going to use this now!

username = 'somebody'
password = 'somepass'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
resp = opener.open('http://www.some.site/index.php', login_data)
content = resp.read()
print content

match = re.search(
    r"<a\s+href=\"(?P<file_link>http://www.some.site/excel_file.php?t=\d+)\">",
    content,
    re.IGNORECASE
)

assert match is not None, "Couldn't find the file link..."

file_link = match.group('file_link')
print "downloading {} with urllib2".format(file_link)
f = opener.open(file_link)
data = f.read()
with open("exceldoc.xls", "wb") as code:
    code.write(data)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM