简体   繁体   English

Scrapy Spider 无法登录 Discord

[英]Scrapy spider cannot log into discord

i am trying to make a scraper for Discord to get all the members of a server, i am stuck at the login though, i can't find the csrf token anywhere in the source code for the page maybe that is why i'm getting this error since a few sources say that it is required but i'm not sure, here's my spider causing the problem我正在尝试为 Discord 制作一个抓取工具以获取服务器的所有成员,但我在登录时卡住了,我在页面的源代码中的任何地方都找不到 csrf 令牌也许这就是我得到的原因这个错误是因为一些消息来源说它是必需的,但我不确定,这是我的蜘蛛导致的问题

from scrapy.http import FormRequest

class RecruteSpider(scrapy.Spider):
    name = "Recruteur"

    def start_requests(self)
        urls = [
            'https://discord.com/login',
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.login)

    def login(self, response):

        url = 'https://discord.com/login'
        formdata = {"username":"SecretUserName", "password":"SecretPassword"}
        yield FormRequest.from_response(
            response = response,
            url = url,
            formdata = formdata,
            callback = self.afterLogin
        )


    def afterLogin(self, response):
        print("Success!!")
        #do stuff

Wen i run the program i get the error我运行程序时出现错误

ValueError: No element found in <200 https://discord.com/login> ValueError:在 <200 https://discord.com/login> 中找不到元素

Even though there clearly is a form element at that url.即使该网址上显然有一个表单元素。

I have also tried using the login url as response variable in the Form response but i get the error我也尝试使用登录 url 作为表单响应中的响应变量,但我收到错误

AttributeError: 'str' object has no attribute 'encoding' AttributeError: 'str' 对象没有属性 'encoding'

if you need any extra detail feel free to ask, any help is greatly appreciated, thanks in advance.如果您需要任何额外的细节,请随时询问,非常感谢您的帮助,提前致谢。

The error you are getting is because discord loads the /login page using javascript and therefore the response does not contain any form elements.您收到的错误是因为 discord 使用 javascript 加载/login页面,因此响应不包含任何表单元素。 You need to render the javascript using either scrapy-playwright (personal favourite), selenium or scrapy-splash .您需要使用scrapy-playwright (个人最喜欢的)、 seleniumscrapy-splash来渲染javascript。

Also your formdata variable contains invalid keys.此外,您的formdata变量包含无效键。 See screenshot of the payload sent to the server in the browser.在浏览器中查看发送到服务器的有效负载的屏幕截图。

请求有效载荷

Using scrapy-playwright , I was able to get to the callback function as below.使用scrapy-playwright ,我能够获得如下回调函数。 Also note that the discord server may require you to solve a captcha once you send the login request which presents another challenge that you will need to solve.另请注意,一旦您发送登录请求,不和谐服务器可能会要求您解决验证码,这是您需要解决的另一个挑战。

discord.py不和谐.py

import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy import FormRequest

class RecruteSpider(scrapy.Spider):
    name = "Recruteur"

    def start_requests(self):
        urls = ['https://discord.com/login']
        for url in urls:
            yield scrapy.Request(url=url, callback=self.login, meta={"playwright": True})

    def login(self, response):
        url = 'https://discord.com/login'
        formdata = {"login":"SecretUserName", "password":"SecretPassword"}
        yield FormRequest.from_response(
            response = response,
            url = url,
            formdata = formdata,
            callback = self.afterLogin
        )

    def afterLogin(self, response):
        print("Success!!")
        #do stuff


if __name__ == "__main__":
    process = CrawlerProcess(settings={
        "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
        "DOWNLOAD_HANDLERS": {
            "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
        }, })
    process.crawl(RecruteSpider)
    process.start()

样本scrapy运行

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM