Scrapy登录身份验证不起作用

Question

Hi I'm working with scrapy to log into some random website. 嗨，我正在使用scrapy登录一些随机网站。 I followed the tutorials on scrapy and it doesn't seem to be working. 我按照scrapy的教程进行操作，似乎没有用。 When i try it, i notice that the "isAuthenticated": False. 当我尝试它时，我注意到“isAuthenticated”：False。 And the html body i return doesn't contain everything that the actual website does. 我返回的html主体不包含实际网站所做的一切。 I'm not sure what the issue is. 我不确定是什么问题。 I thought it was the CSRFtoken but after research i found that scrapy should handle that. 我认为这是CSRFtoken但经过研究我发现scrapy应该处理它。 Here is the code below. 这是下面的代码。 Any Suggestions? 有什么建议么？

  import scrapy
  import sys
  from scrapy import Spider
  from scrapy import Request

  class IvanaSpider(Spider):
     name = 'ivanaSpider'

     def start_requests(self):
        return [scrapy.FormRequest(
              'https://bitbucket.org/account/signin/?next=/',
              formdata={'username': 'username', 'password': 'password',
                       'form_build_id': 'form - v14V92zFkSSVFSerfvWyH1WEUoxrV2khjfhAETJZydk',
                       'form_id': 'account_api_form',
                       'op': 'Sign in'
              },
              callback=self.after_login)]

     def after_login(self, response):
        # check login succeed before going on
        if "It's recommended that you log in" in response.body:
           print "------------------------------------------"
           self.logger.error("Login failed")
           return

        # continue scraping with authenticated session...
        for line in response.xpath('//body').extract():
           print line.encode(sys.stdout.encoding, errors='replace')

Answer 1

To login to website, you will need to use FormRequest but for some website, ie bitbucket in your example, 要登录网站，您需要使用FormRequest，但对于某些网站，例如您的示例中的bitbucket ，

they use predefined form attributes like CSRFtoken, session info and other tokens which can only be used from the previous page user have visited 他们使用预定义的表单属性，如CSRFtoken，会话信息和其他令牌，只能在用户访问过的上一页中使用

In such cases, one can use FormRequest.from_response method of scrapy which collect all pre-defined params from the response and post them as a formdata 在这种情况下，可以使用Scrapy的FormRequest.from_response方法从响应中收集所有预定义的params并将它们作为formdata发布

# For example 
import scrapy
import sys
from scrapy import Spider
from scrapy import Request

class IvanaSpider(Spider):
    name = 'ivanaSpider'
    start_urls = (
        'https://bitbucket.org/account/signin/?next=/',
    )

    def parse(self, response):
        yield scrapy.FormRequest.from_response(
            response=response,
            formdata={"username": "<your username>",
                      "password": "<your password>"},
            #formname="login",apparently there are many socal login forms so select one based on xpath ( form id)
            formxpath=".//form[@id='aid-login-form']",
            callback=self.after_login,
            dont_click=True,

        )

    def after_login(self, response):
        # check login succeed before going on
        if "It's recommended that you log in" in response.body:
            print "------------------------------------------"
            self.logger.error("Login failed")
            return

        # continue scraping with authenticated session...
        for line in response.xpath('//body').extract():
            print line.encode(sys.stdout.encoding, errors='replace')

Scrapy登录身份验证不起作用

问题描述

1 个解决方案

解决方案1
0 2016-07-02 04:57:29

Scrapy登录身份验证不起作用

问题描述

1 个解决方案

解决方案1 0 2016-07-02 04:57:29

解决方案1
0 2016-07-02 04:57:29