简体   繁体   English

如何使用scrapy shell和python登录此站点-401错误?

[英]How do i login to this site with scrapy shell and python - 401 Error?

Im trying to login to this website, seeking.com/login through scrapy shell. 我正在尝试通过scrapy shell登录到此网站,seek.com/login。 i also installed burp suite to analyze its url and headers, etc. 我还安装了burp suite来分析其url和标头等。

from scrapy.http import FormRequest
frmdata = {"captcha":"","email":"MYEMAIL.com","password":"MY_PASSWORD","is_rememberme":"0","locale":"en_US","auth_type":"bearer_token","date":"2018-12-13T09:56:22.957Z"}


url = "https://www.seeking.com/v3/auth/login"
r = FormRequest(url, formdata=frmdata)
fetch(r)

with this code i get a HTTP 401 Error, as far as i can tell essentially an authentication error. 通过此代码,我得到了一个HTTP 401错误,据我所知,它实质上是一个身份验证错误。

I forwarded the calls through burpsuite and got the following intercept. 我通过burpsuite转发了呼叫,并得到了以下拦截。

POST /v3/auth/login HTTP/1.1
Host: www.seeking.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:63.0) 
Gecko/20100101 Firefox/63.0
Accept: application/json, text/plain, */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: https://www.seeking.com/login?fromLogout=1
Content-Type: application/json;charset=utf-8
Web-Version: 3.59
Authorization: Basic NTI2ZTIwYzExMDI0NDYzNTk5OTI5MzUwZThiNWEzMTI6bHN0emd4ZzpSRzRzS3VmdEJMRTQxMm92TnMxbDR6L0ZkZ1dESHZuM2wwZWxtYWhyMGtnPQ==
Content-Length: 166
Connection: close
Cookie: __cfduid=dcf9fd66583d55382f362c18a83d904ca1544519479; 
_gcl_au=1.1.2035701377.1544519485; _ga=GA1.2.1740241044.1544519486; 
com.silverpop.iMAWebCookie=e88c45d1-3c24-11c6-089e-e287aae2c678; 
__cfruid=3eebbdc1e401ed560c23a7c474c41e59b2e93018-1544520179; 
device_cookie=1; __gads=ID=a1e437c03ddad1b3:T=1544519579:S=ALNI_MYb30xY4z76J4NniCK_ZtOyOdPMKA;_lb_user=gfpuzje6kg; seeking_session=eyJpdiI6Im4yMTNJNVNRZjkxbnZzMmNpYnQ4dkE9PSIsInZhbHVlIjoiVGhGVUJDejc1dElJbEwxekh5d2hXUnhjeDlpVWR2dW9IWWJqeDZvRmI3VU9Pc1lpZXZGWGJxejQ1alNXbGVXUGJqaEpORU9LNFJITVh0N3IwR1E0bUE9PSIsIm1hYyI6IjUyODU3MWIxYjM3MGU3M2E0YjI1YzM2MzNmNDc5ZDMzZDdjYTg1ZWMxYWU2ODJjY2JlMTJmZWJlNmUyZDkyNWMifQ%3D%3D {"captcha":"","email":"MYEMAIL","password":"MYPASS","is_rememberme":0,"locale":"en_US","auth_type":"bearer_token","date":"2018-12-14T09:15:56.016Z"}

I am completely new to this, and have spent 2 days trying to figure out what i need to pass to this POST to login. 我对此完全陌生,花了2天的时间来弄清楚我需要传递给此POST进行登录的内容。

My question is 我的问题是

1) based on this intercept what should my request via FormRequest look like? 1)基于此拦截,我通过FormRequest的请求应该是什么样?

2) I see there are cookies/authorization (Authorization token, that changes with each POST, session cookies, etc) tokens that are being passed in to the post... Where do they come from? 2)我看到正在传递到帖子中的cookie /授权(授权令牌,随每个POST,会话cookie等更改)……它们来自何处? How do i get them when i am scraping so that i can successfully login? 我在抓取时如何获取它们以便成功登录?

3) Do i need to store these session variables when scraping other pages on the site after login? 3)登录后在网站上抓取其他页面时,是否需要存储这些会话变量? Anything special i need to do to stay logged in to access other pages? 我需要做些特别的事情来保持登录状态以访问其他页面?

It looks like the login page is expecting to be passed soon data, and not a url-encoded string (which is what FormRequest will create). 看起来登录页面预期将很快传递数据,而不是URL编码的字符串( FormRequest将创建的字符串)。

Something like this should work: 这样的事情应该起作用:

r = scrapy.Request(
    url=url,
    method='POST',
    body=json.dumps(frmdata),
    headers={'Content-Type': 'application/json'},
)

The tokens, cookies, etc. are probably created when you initially request the login page, so you might need to request the login page before trying to log in. 令牌,Cookie等可能是在您最初请求登录页面时创建的,因此您可能需要在登录之前请求登录页面。
It is possible that some of it is generated with javascript (haven't checked), so you might need to dig through the js code to figure out what's going on, or even execute the js yourself (eg using a browser). 可能其中一些是用javascript生成的(未选中),因此您可能需要深入研究js代码以了解正在发生的事情,甚至自己执行js(例如,使用浏览器)。

Scrapy will keep track of your session for you, so there's nothing you need to do to stay logged in. Scrapy将为您跟踪会话,因此您无需执行任何操作即可保持登录状态。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM