简体   繁体   中英

HTTP Error 403: Forbidden w/ User Agents Added

Usually I've been able to get around 403 Errors once I've added a known User Agent but I'm now trying to login and then eventually scrape and cannot figure out how to bypass this error.

Code:

import urllib
import http.cookiejar

cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
authentication_url = 'https://www.linkedin.com/'
payload = {
    'session_key': 'email',
    'session_password': 'password'
}
data = urllib.parse.urlencode(payload)
binary_data = data.encode('UTF-8')
req = urllib.request.Request(authentication_url, binary_data)
resp = urllib.request.urlopen(req)
contents = resp.read()

Traceback :

    Traceback (most recent call last):
  File "C:/Python34/loginLinked.py", line 16, in <module>
    resp = urllib.request.urlopen(req)
  File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python34\lib\urllib\request.py", line 469, in open
    response = meth(req, response)
  File "C:\Python34\lib\urllib\request.py", line 579, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python34\lib\urllib\request.py", line 507, in error
    return self._call_chain(*args)
  File "C:\Python34\lib\urllib\request.py", line 441, in _call_chain
    result = func(*args)
  File "C:\Python34\lib\urllib\request.py", line 587, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

See my answer to this question:

why isn't Requests not signing into a website correctly?

I should start with stating that you really should use their API: http://developer.linkedin.com/apis

There does not seem to be any POST login on the frontpage of linkedin using those parameters?

This is the login URL you must POST to: https://www.linkedin.com/uas/login-submit

Be aware that this probably wont work either, as you need at least the csrfToken parameter from the login form.

You probably need the loginCsrfParam too, also from the login form on the frontpage.

Something like this might work. Not tested, you might need to add the other POST parameters.

import requests
s = requests.session()

def get_csrf_tokens():
    url = "https://www.linkedin.com/"
    req = s.get(url).text

    csrf_token = req.split('name="csrfToken" value=')[1].split('" id="')[0]
    login_csrf_token = req.split('name="loginCsrfParam" value="')[1].split('" id="')[0]

    return csrf_token, login_csrf_token


def login(username, password):
    url = "https://www.linkedin.com/uas/login-submit"
    csrfToken, loginCsrfParam = get_csrf_tokens()

    data = {
        'session_key': username,
        'session_password': password,
        'csrfToken': csrfToken,
        'loginCsrfParam': loginCsrfParams
    }

    req = s.post(url, data=data)

login('username', 'password')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM