简体   繁体   English

如何停止scrapy 301重定向并停止解析重定向的页面

[英]how to stop scrapy 301 redirects and stop parsing the redirected page

I am trying to crawl a page that redirects scrapy for whatever reason via 301 to the english version and then the site gets parsed which it should not as the rules clearly exclude the URL. 我正在尝试抓取一个页面,该页面出于任何原因将scrapy通过301重定向到英语版本,然后对该网站进行了解析,因为规则明确排除了URL,因此不应对其进行解析。

While searching for a solution on how to stop any redirect I came upon the following code: 在寻找有关如何停止任何重定向的解决方案时,我遇到了以下代码:

meta ={'dont_redirect': True}

Unfortunatelly this does not have any effect. 不幸的是,这没有任何作用。 My spider class looks like this: 我的蜘蛛课看起来像这样:

class GetbidSpider(CrawlSpider):
    name = 'test'
    meta ={'dont_redirect': True}
    allowed_domains = ['www.example.de']
    start_urls = ['https://www.example.url/bla.html']

    rules = (

        Rule(
            LinkExtractor(allow=['.*Mein-String.*[a-z]::[0-9].*']), 
            callback='parse_item'
        ),

        Rule(
            LinkExtractor(allow=['^.*de\/((?!My-String|:_:|productListingStyle|\.php).)*$']), 
            follow=True
        ),
    )

Is this the right place to configure the redirect and why is scrapy parsing the input while the first URL rule will not execute? 这是配置重定向的正确位置吗?为什么在第一个URL规则将不执行的情况下抓取分析输入内容?

Why what you tried doesn't work: 为什么您尝试的方法不起作用:

  • The rules only determine what requests will be created by your spider, they don't control redirect logic. 规则仅确定蜘蛛将创建哪些请求,它们不控制重定向逻辑。
  • meta is an attribute of a Request , and only works on a per-request basis. metaRequest的属性,并且仅在每个请求的基础上起作用。

How to disable redirects: 如何禁用重定向:

The easiest way to disable redirects globally is to set the REDIRECT_ENABLED setting to False . 全局禁用重定向的最简单方法是将REDIRECT_ENABLED设置设置为False

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM