简体   繁体   English

如何使用python访问受密码保护的网站?

[英]How to access a password protected site using python?

I was thinking that, if I access a password protected site using python's mechanism, I would get a 401 Unauthorized error which needs authentication data. 我当时在想,如果我使用python的机制访问受密码保护的网站,则会收到401未经授权的错误,该错误需要身份验证数据。

So inside my script, I tried to access my yahoo mail box which apparently needs username and password, I thought I would get 401, but I didn't. 因此,在脚本中,我尝试访问我的yahoo邮箱,该邮箱显然需要用户名和密码,我以为我会得到401,但我没有。

Code: 码:

yahoo_mail = 'http://mail.cn.yahoo.com'
br = mechanize.Browser()
r = br.open(yahoo_mail)
print r.info()  #here, I got 200, it's ok apparently

br.select_form(nr=0)  #select the login form
r = br.submit()  #submit the form without providing username and password
print r.info()  #but I didn't get 401, why?

Question: 题:

  1. Why I didn't get 401 without providing auth-info ? 为什么我没有提供auth-info却没有得到401?
  2. If not my mail box, any other website can give me a 401 ? 如果不是我的邮箱,那么其他任何网站都可以给我401吗?

Most web sites these days do not use HTTP Authentication. 如今,大多数网站都不使用HTTP身份验证。 So 401 is not returned if you fail to log in; 因此,如果您无法登录,则不会返回401。 instead, a normal 200 successful response is returned, and the text inside the web page says you did not log in. 而是返回正常的200成功响应,并且网页的文本显示您尚未登录。

Instead, sites use cookies. 而是,网站使用cookie。 This means that your browser does not actually know what sites it is logged into; 这意味着您的浏览器实际上并不知道它登录了哪些网站。 when you finally provide a successful password to Yahoo!, it either changes the cookie it has stored on your browser, or maybe even keeps the cookie the same but just changes the database record on their end that is associated with the cookie. 当您最终为Yahoo!提供成功的密码时,它要么更改它存储在浏览器中的cookie,要么甚至使cookie保持不变,而只是更改与cookie关联的数据库记录。

So HTTP status codes are generally useless during the process of logging in. Instead you will have to scrape the text of the "200 Success" page that comes back to see if it congratulates you on logging in or repeats the form; 因此,HTTP状态代码通常在登录过程中没有用。相反,您将不得不刮擦返回的“ 200 Success”页面的文本,以查看它是否祝贺您登录或重复该表格; or, alternately, you might just check the URL of the page you get back, and see whether it is the login form again, or whether it is instead the destination that you wanted to visit. 或者,您也可以只检查返回的页面的URL,然后再次查看它是否为登录表单,或者是否为您想要访问的目的地。

  1. Authentication failed doesn't mean you're not allowed to see the page behind the authentication. 身份验证失败并不表示不允许您查看身份验证后面的页面。 It means you won't see the version of this page that take your credentials into account. 这意味着您将不会看到考虑您的凭据的此页面的版本。 If you're on a homepage and you failed to authenticate, you still can see the homepage. 如果您在主页上但未能通过身份验证,则仍然可以看到该主页。

  2. Search engines don't seem to index 401 pages, so it can be a bit hard to find... 搜索引擎似乎没有为401页编制索引,因此可能很难找到...

It looks like Yahoo just handles the password authentication in their code. 看起来Yahoo只是在他们的代码中处理密码验证。 Try adding the following two lines to your code: 尝试将以下两行添加到您的代码中:

f = open('a.html', 'w')
f.write(r.read())

When you read the page, you will see the same page again. 阅读该页面时,您将再次看到同一页面。

It looks like they just have a bit of javascript that tells you your password was wrong. 看起来他们只有一些Javascript告诉您您的密码错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM