简体   繁体   English

登录网站并抓取数据

[英]Logging into website and scraping data

The website I am trying to log in to is https://realitysportsonline.com/RSOLanding.aspx . 我要登录的网站是https://realitysportsonline.com/RSOLanding.aspx I can't seem to get the login to work since the process is a little different to a typical site that has a login specific page. 我似乎无法使登录正常工作,因为该过程与具有特定于登录页面的典型站点有些不同。 I haven't got any errors, but the log in action doesn't work, which then causes the main to redirect to the homepage. 我没有任何错误,但是登录操作不起作用,然后导致主体重定向到首页。

import requests
url = "https://realitysportsonline.com/RSOLanding.aspx"
main = "https://realitysportsonline.com/SetLineup_Contracts.aspx?leagueId=3000&viewingTeam=1"
data = {"username": "", "password": "", "vc_btn3 vc_btn3-size-md vc_btn3-shape-rounded vc_btn3-style-3d vc_btn3-color-danger" : "Log In"}
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
          'Referer':  'https://realitysportsonline.com/RSOLanding.aspx', 
          'Host':  'realitysportsonline.com',
          'Connection':   'keep-alive',
          'Accept-Language':    'en-US,en;q=0.5',
          'Accept-Encoding':    'gzip, deflate, br',
          'Accept':  'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8'}

s = requests.session()
s.get(url)
r = s.post(url, data, headers=header)

page = requests.get(main)

First of all, you create a session and assuming your POST request worked, you then request an authorised page without using your previously created session. 首先,您创建一个会话,并假定您的POST请求有效,然后在不使用先前创建的会话的情况下请求一个授权页面。

You need to make the request with the s object you created like so: page = s.get(main) 您需要使用创建的s对象发出请求,如下所示: page = s.get(main)

However, there were also a few issues with your POST request. 但是,您的POST请求也存在一些问题。 You were making a request to the home page instead of the /Login route. 您是在向主页而不是/Login路由发出请求。 You were also missing the Content-Type header. 您还缺少Content-Type标头。

import requests

url = "https://realitysportsonline.com/Services/AccountService.svc/Login"
main = "https://realitysportsonline.com/LeagueSetup.aspx?create=true"
payload = {"username":"","password":""}
headers = {
    'Content-Type': "text/json",
    'Cache-Control': "no-cache"
}

s = requests.session()
response = s.post(url, json=payload, headers=headers)
page = s.get(main)

PS your main request url redirects to the homepage, even with a valid session (at least for me). PS,即使有有效的会话,您的main请求网址也会重定向到首页(至少对我而言)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM