简体   繁体   English

无法使用请求登录网站

[英]Can't log in to a site using requests

I'm trying to log in to this site using requests module but I get 403 status code every time I try with my following attempt.我正在尝试使用请求模块登录该站点,但每次尝试以下尝试时都会收到403状态代码。 Although I tried to mimic the way the requests is being sent by monitoring dev tools, I can't make it work.尽管我试图模仿监控开发工具发送请求的方式,但我无法让它工作。 The credentials (username: simpndev@gmail.com , password: +agb5E2?w2pQJ3z ) that I've used here are for test purposes only, so you are free to use.我在这里使用的凭据(用户名: simpndev@gmail.com ,密码: +agb5E2?w2pQJ3z )仅用于测试目的,因此您可以免费使用。

To get the form, all you need to do is click on the login button and then on the Fantasy button.要获取表单,您需要做的就是单击login按钮,然后单击Fantasy按钮。

I've tried with:我试过:

import re
import requests

link = 'https://www.fanduel.com/contests'
url = 'https://api.fanduel.com/sessions'

payload = {"email":"simpndev@gmail.com","password":"+agb5E2?w2pQJ3z","product":"DFS"}

def log_in(s):
    r = s.get(link)
    client_id = re.findall(r"clientId\":\"(.*?)\",",r.text)[0]
    s.headers['authorization'] = f'Basic {client_id}'
    s.headers['Referer'] = 'https://www.fanduel.com/login'
    s.headers['accept'] = 'application/json'
    r = s.post(url,json=payload)
    print(r.status_code)

if __name__ == '__main__':
    with requests.Session() as s:
        s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36'
        log_in(s)

I found success using selenium, so I don't wish to go that route.我发现使用 selenium 成功了,所以我不希望 go 这条路线。

How can I log in to that site using requests?如何使用请求登录到该站点?

There are 2 potential 403 errors involved in this request.此请求涉及 2 个潜在的 403 错误。 The 403 from bot protection at link = 'https://www.fanduel.com/contests' and a 403 from the request to /sessions/来自link = 'https://www.fanduel.com/contests'机器人保护的 403 和来自对 /sessions/ 的请求的 403

The 403 from link is using some advanced features of browsers to check for repeated attempts to login.来自link的 403 使用浏览器的一些高级功能来检查重复尝试登录。 That's going to be a more complicated subject involving a comparison of User-Agent strings to HTTP2 traffic, beating captchas with ML and so forth.这将是一个更复杂的主题,涉及将 User-Agent 字符串与 HTTP2 流量进行比较,用 ML 击败验证码等等。 My advice is not to go down that path.我的建议是不要沿着这条路走 go。

Instead, lower the version of your user-agent string and make sure you're providing the correct headers.相反,请降低您的用户代理字符串的版本,并确保您提供了正确的标头。 You're not providing the correct headers to the initial request and so you're generating 403s and then you're on a bot blacklist and trying to manage that.您没有为初始请求提供正确的标头,因此您正在生成 403,然后您在机器人黑名单上并试图对其进行管理。

The following works for me as shown in the debugger screenshot below:以下对我有用,如下面的调试器屏幕截图所示:

import re
import requests

link = 'https://www.fanduel.com/contests'
url = 'https://api.fanduel.com/sessions'

payload = {"email":"simpndev@gmail.com","password":"+agb5E2?w2pQJ3z","product":"DFS"}

def log_in(s):
    r = s.get(link)
    client_id = re.findall(r"clientId\":\"(.*?)\",",r.text)[0]
    s.headers['Authorization'] = f'Basic {client_id}'
    s.headers['Referer'] = 'https://www.fanduel.com/login?cc_success_url=%2Fcontests'
    s.headers['Accept'] = 'application/json'
    s.headers['Accept-Encoding'] = 'gzip, deflate, br'
    s.headers['Accept-Language'] = 'en-US,en;q=0.5'
    s.headers['Origin'] = "https://www.fanduel.com"
    r = s.post(url,json=payload)
    print(r.status_code)

if __name__ == '__main__':
    with requests.Session() as s:
        s.headers['User-Agent'] = 'Mozilla/5.0 (en-us) AppleWebKit/534.14 (KHTML, like Gecko; Google Wireless Transcoder) Chrome/9.0.597 Safari/534.14 wimb_monitor.py/1.0'
        log_in(s)

Notice I modified your Referer, modified capitalization on your header keys, and provided what might be thought of as superfluous headers.请注意,我修改了您的引用者,修改了 header 键的大小写,并提供了可能被认为是多余的标题。 When checking r.request.headers I saw differences between those being sent by requests and eg Firefox, so I just added in anything that was different.在检查 r.request.headers 时,我看到了请求发送的内容与例如 Firefox 之间的差异,所以我只是添加了任何不同的内容。

Also notice that your bearer token and account credentials are now spread far and wide and may be contributing to additional 403s if you're still using those to test with.另请注意,您的不记名令牌和帐户凭据现在分布广泛,如果您仍在使用它们进行测试,可能会导致额外的 403。 You'll want a clean account since many people may have these creds now.您需要一个干净的帐户,因为现在很多人可能拥有这些信用。

在此处输入图像描述

Been there, had lot of issues using Selenium in the past for web scraping.去过那里,过去使用 Selenium 进行 web 抓取时遇到了很多问题。

One alternative that I've got more into is using mitmproxy to dump the navigation script to a file and replaying it using requests , something like this:我更深入的一种替代方法是使用mitmproxy将导航脚本转储到文件并使用requests重放它,如下所示:

import mitmproxy.io
import requests

def main():
    sess = requests.Session()
    with open('flows', 'rb') as f: data = [row for row in mitmproxy.io.FlowReader(f).stream()]
    for d in data:
        url = d.request.url
        raw_content = d.request.raw_content
        headers = dict([(k, v,) for k, v in d.request.headers.items() if not k.startswith(':') and k.lower() != 'cookie'])
        print(url, raw_content, headers)
        if d.request.method == 'GET':
            r = sess.get(url, headers=headers, data=raw_content)
        if d.request.method == 'POST':
            r = sess.post(url, headers=headers, data=raw_content)
        print(r.status_code)
        if d.request.url == 'https://api.fanduel.com/sessions':
            break

if __name__ == '__main__':
    main()

The code above (using my own flow dumped from mitmproxy) results in the output below:上面的代码(使用我自己从 mitmproxy 转储的流)导致下面的 output:

http://www.fanduel.com/ b'' {'Host': 'www.fanduel.com', 'Upgrade-Insecure-Requests': '1', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 13_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1', 'Accept-Language': 'en-gb', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'keep-alive'}
200
https://www.fanduel.com/ b'' {'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'upgrade-insecure-requests': '1', 'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 13_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1', 'accept-language': 'en-gb', 'accept-encoding': 'gzip, deflate'}
200
https://www.fanduel.com/JMCVuBG8/init.js b'' {'accept': '*/*', 'accept-encoding': 'gzip, deflate, br', 'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 13_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1', 'accept-language': 'en-gb', 'referer': 'https://www.fanduel.com/'}
200
https://www.fanduel.com/login?source=Header%20Login b'' {'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'accept-encoding': 'gzip, deflate, br', 'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 13_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1', 'accept-language': 'en-gb', 'referer': 'https://www.fanduel.com/'}
200 
https://api.fanduel.com/sessions b'{"email":"simpndev@gmail.com","password":"+agb5E2?w2pQJ3z","product":"DFS"}' {'accept': 'application/json', 'origin': 'https://www.fanduel.com', 'content-type': 'application/json', 'authorization': 'Basic ZWFmNzdmMTI3ZWEwMDNkNGUyNzVhM2VkMDdkNmY1Mjc6', 'referer': 'https://www.fanduel.com/login', 'content-length': '75', 'accept-language': 'en-gb', 'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 13_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.2 Mobile/15E148 Safari/604.1', 'accept-encoding': 'gzip, deflate, br'}
201

All you'll need is a device with a proxy set to pass through your own installation of mitmproxy (I usually use my phone to route the requests using my PC and SOCKS5).您所需要的只是一个设置了代理的设备,以通过您自己的 mitmproxy 安装(我通常使用我的手机通过我的 PC 和 SOCKS5 路由请求)。 If something changes in the navigation of the website, all you'll need to do is build a new flow and dump the script to feed the python program.如果网站导航发生变化,您只需构建一个新流程并转储脚本以提供 python 程序。

That won't solve more complex javascript security implementations, but should be enough for simple ones such as this website's.这不会解决更复杂的 javascript 安全实现,但对于像本网站这样的简单实现来说应该足够了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM