简体   繁体   中英

How to access a password protected site using python?

I was thinking that, if I access a password protected site using python's mechanism, I would get a 401 Unauthorized error which needs authentication data.

So inside my script, I tried to access my yahoo mail box which apparently needs username and password, I thought I would get 401, but I didn't.

Code:

yahoo_mail = 'http://mail.cn.yahoo.com'
br = mechanize.Browser()
r = br.open(yahoo_mail)
print r.info()  #here, I got 200, it's ok apparently

br.select_form(nr=0)  #select the login form
r = br.submit()  #submit the form without providing username and password
print r.info()  #but I didn't get 401, why?

Question:

  1. Why I didn't get 401 without providing auth-info ?
  2. If not my mail box, any other website can give me a 401 ?

Most web sites these days do not use HTTP Authentication. So 401 is not returned if you fail to log in; instead, a normal 200 successful response is returned, and the text inside the web page says you did not log in.

Instead, sites use cookies. This means that your browser does not actually know what sites it is logged into; when you finally provide a successful password to Yahoo!, it either changes the cookie it has stored on your browser, or maybe even keeps the cookie the same but just changes the database record on their end that is associated with the cookie.

So HTTP status codes are generally useless during the process of logging in. Instead you will have to scrape the text of the "200 Success" page that comes back to see if it congratulates you on logging in or repeats the form; or, alternately, you might just check the URL of the page you get back, and see whether it is the login form again, or whether it is instead the destination that you wanted to visit.

  1. Authentication failed doesn't mean you're not allowed to see the page behind the authentication. It means you won't see the version of this page that take your credentials into account. If you're on a homepage and you failed to authenticate, you still can see the homepage.

  2. Search engines don't seem to index 401 pages, so it can be a bit hard to find...

It looks like Yahoo just handles the password authentication in their code. Try adding the following two lines to your code:

f = open('a.html', 'w')
f.write(r.read())

When you read the page, you will see the same page again.

It looks like they just have a bit of javascript that tells you your password was wrong.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM