简体   繁体   中英

Logging into a forum using Python Requests

I am trying to log into a forum using python requests. This is the forum I'm trying to log into: http://fans.heat.nba.com/community/

Here's my code:

import requests
import sys

URL = "http://fans.heat.nba.com/community/index.php?app=core&module=global&section=login"

def main():
    session = requests.Session()

    # This is the form data that the page sends when logging in
    login_data = {
        'ips_username': 'username',
        'ips_password': 'password',
        'signin_options': 'submit',
        'redirect':'index.php?'
    }

    r = session.post(URL, data=login_data)

    # Try accessing a page that requires you to be logged in
    q = session.get('http://fans.heat.nba.com/community/index.php?app=members&module=messaging&section=view&do=showConversation&topicID=4314&st=20#msg26627')
    print(session.cookies)
    print(r.status_code)
    print(q.status_code)

if __name__ == '__main__':
    main()

The URL is the login page on the forums. With the 'q' variable, the session tries to access a certain webpage on the forums (private messenger) that can only be accessed if you're logged in. However, the status code for that request returns '403', which means that I was unable to log in successfully.

Why am I unable to log in? In the 'login_data', 'ips_username' and 'ips_password' are the HTML forms. However, I believe I have the actual log-in commands ('signin_options','redirect') wrong.

Can somebody guide me to the correct log-in commands please?

There are a hidden input in the form auth_key

<input type='hidden' name='auth_key' value='880ea6a14ea49e853634fbdc5015a024' />

So you need to parse it and pass it to the login page. You could simply use regex

def main():
      session = requests.Session()

      # Get the source page that contain the auth_key
      r = requests.get("http://fans.heat.nba.com/community/index.php?app=core&module=global&section=login")
      # Parse it
      auth_key = re.findall("auth_key' value='(.*?)'",r.text)[0]


      # This is the form data that the page sends when logging in
      login_data = {
           'ips_username': 'username',
           'ips_password': 'password',
           'auth_key' : auth_key                                                                                                                      

      }

And the rest should be the same.

As indicated by @Chaker in the comments, the login form requires you to send an auth_key that you need to read from an initial visit to a page first.

The auth_key is a hidden form field with a random value (generated and stored by the server), so every regular web browser sends that with the POST request. The server then validates the request and requires it to contain an auth_key that it knows is valid (by checking against its list of issued auth_keys). So the process needs to be as follows:

  • Visit the front page (or any page below that probably)
  • Read the value of the auth_key hidden field
  • Create a POST request that includes your credentials and that auth_key

So this seems to work:

import re
import requests

USERNAME = 'username'
PASSWORD = 'password'

AUTH_KEY = re.compile(r"<input type='hidden' name='auth_key' value='(.*?)' \/>")

BASE_URL = 'http://fans.heat.nba.com/community/'
LOGIN_URL = BASE_URL + '/index.php?app=core&module=global&section=login&do=process'
SETTINGS_URL = BASE_URL + 'index.php?app=core&module=usercp'

payload = {
    'ips_username': USERNAME,
    'ips_password': PASSWORD,
    'rememberMe': '1',
    'referer': 'http://fans.heat.nba.com/community/',
}

with requests.session() as session:
    response = session.get(BASE_URL)
    auth_key = AUTH_KEY.search(response.text).group(1)
    payload['auth_key'] = auth_key
    print("auth_key: %s" % auth_key)

    response = session.post(LOGIN_URL, data=payload)
    print("Login Response: %s" % response)

    response = session.get(SETTINGS_URL)
    print("Settings Page Response: %s" % response)

assert "General Account Settings" in response.text

Output:

auth_key: 777777774ea49e853634fbdc77777777
Login Response: <Response [200]>
Settings Page Response: <Response [200]>

AUTH_KEY is a regular expression that matches any pattern that looks like <input type='hidden' name='auth_key' value='?????' \\/> <input type='hidden' name='auth_key' value='?????' \\/> where ????? is a group of zero or more characters (non-greedy, which means it looks for the shortest match). The documentation on the re module should get you started with regular expressions. You can also test that regular expression here , have it explained and toy around with it.

Note : If you were to actually parse (X)HTML, you should always use an (X)HTML parser . However, for this quick and dirty way to extract the hidden form field, a non-greedy regex does the job just fine.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM