简体   繁体   English

Python要求,如何登录网站

[英]Python-requests, how to login to website

I am attempting to scrape this website but it requires a login. 我正在尝试抓取该网站,但需要登录。 I am struggling to successfully log in through the use of the requests library in python. 我正在努力通过使用python中的请求库成功登录。

Looking through the form in the html, there are no hidden values and while intercepting http requests in the console, the login post request for forms contains username:"username here" password:"password here". 查看html中的表单,没有隐藏值,并且在控制台中拦截http请求时,表单的登录后请求包含用户名:“此处的用户名”,密码:“此处的密码”。

I have also attempted adjusting the headers, as I read that that some servers might deny access to non browser header types. 我还尝试调整标头,因为我读到某些服务器可能会拒绝访问非浏览器标头类型。

Here are my attempts 这是我的尝试

import requests
from fake_useragent import UserAgent

ua = UserAgent()
headers = {"User-Agent": str(ua.chrome)}

payload = {"username": "username",
           "password": "password"
          }

login = requests.get("https://scsctennis.gametime.net/auth", 
headers=headers)

response = requests.post("https://scsctennis.gametime.net/auth", 
data=payload, cookies=login.cookies, headers=headers)

print(response.text)

and as well 还有

import requests
from fake_useragent import UserAgent

ua = UserAgent()
headers = {"User-Agent": str(ua.chrome)}

payload = {"username": "username",
           "password": "password"
          }

s = requests.session()
login = s.get("https://scsctennis.gametime.net/auth", headers=headers)

response = s.post("https://scsctennis.gametime.net/auth", data=payload, 
headers=headers)

print(response.text)

One thing I have noticed, after the post request, if I attempt to view the cookie - print(response.cookies) there is no cookie, but for the get request, print(login.cookies) there is a cookie. 我注意到的一件事是,在发布请求之后,如果我尝试查看cookie-print(response.cookies)没有cookie,但是对于get请求,print(login.cookies)有cookie。

I have followed and read through this blog and the requests documentation , and have gone through many stackoverflow posts. 我已经阅读并阅读了本博客请求文档 ,并阅读了许多stackoverflow帖子。 Any help would be appreciated, thanks. 任何帮助,将不胜感激,谢谢。

Edit You are right, it posts to " https://scsctennis.gametime.net/auth/json-index " Here is changed up code with recommendations. 编辑您是对的,它发布到“ https://scsctennis.gametime.net/auth/json-index ”,这里是带有建议的代码更改。

import requests

# headers = {'x-requested-with': 'XMLHttpRequest'}
headers = {"Content-Type": "application/x-www-form-urlencoded; charset=UTF-
8"}

payload = {"username": "username",
           "password": "password"
          }

s = requests.session()
login = s.get("https://scsctennis.gametime.net/auth/json-index", 
headers=headers)
print(login.text)
response = s.post("https://scsctennis.gametime.net/auth/json-index", 
data=payload, headers=headers)
print(response.text)

The response of each respective print statement: 每个打印语句的响应:

{"code":505,"msg":"The username or password was not recognized. Please check the spelling and try again."} {“ code”:505,“ msg”:“无法识别用户名或密码。请检查拼写,然后重试。”}

{"code":202,"msg":"The username or password was not recognized. Please check the spelling and try again.","isStaff":false,"user":{"name":"Vuk"}} {“ code”:202,“ msg”:“无法识别用户名或密码。请检查拼写,然后重试。”,“ isStaff”:false,“ user”:{“ name”:“ Vuk”}}

I receive the 505 message simply by visiting the url, not posting to it. 我只是通过访问URL而不是将其发布而收到505消息。

202 message is when I post to the url, however the username/password are correct but it says they are wrong. 202消息是我发布到url时的消息,但是用户名/密码正确,但它表示它们是错误的。 Not sure why? 不知道为什么吗? The "isStaff":false,"user":{"name":"Vuk"} response is correct, as that is my name that is associated with the attempted login credentials, and I am not a staff member. “ isStaff”:false,“ user”:{“ name”:“ Vuk”}响应是正确的,因为这是与尝试登录凭据关联的我的名字,我不是工作人员。

Any thoughts on how to proceed? 对如何进行有任何想法吗?

Last Edit: Successfully got it. 上次编辑:成功。 Thanks for catching that I wasn't posting to correct url! 感谢您发现我没有张贴正确的网址! Turns out, the 202 message above is successful. 原来,上面的202消息成功。 It recognizes my name as belonging to the login credentials, but they just choose to display whatever message. 它识别出我的名字属于登录凭据,但是他们只是选择显示任何消息。 After the post request, if I use a get request to my desired page, I receive a good response. 发布请求后,如果我对所需页面使用get请求,则会收到良好的响应。 Thanks! 谢谢!

import requests


payload = {"username": "username",
           "password": "password"
           }

s = requests.session()

response = s.post("https://scsctennis.gametime.net/auth/json-index", 
data=payload)
print(response.text)
stuff = s.get("http://scsctennis.gametime.net/scheduling/index/jsoncourtdata/sport/1/date/2017-12-25")` 

print(stuff.text)

I see the form posts credentials to " https://scsctennis.gametime.net/auth/json-index " and get the json in response. 我看到表单将凭证发布到“ https://scsctennis.gametime.net/auth/json-index ”,并得到json作为响应。

Can you post into this endpoint instead of the one you posted? 您可以将其发布到该端点而不是您发布的端点吗?

Posting fake credentials to this endpoint: 将伪造的凭证发布到此端点:

curl "https://scsctennis.gametime.net/auth/json-index" -H "Content-Type: application/x-www-form-urlencoded; charset=UTF-8" -H "Cookie: gametime=ba3725642c5b55fe1123dec46e45e3a7" --data "username=test&passwo
rd=test"

returns error like {"code":505,"msg":"The username or password was not recognized. Please check the spelling and try again."} 返回错误,例如{"code":505,"msg":"The username or password was not recognized. Please check the spelling and try again."}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM