简体   繁体   中英

Scrapy - WARNING: Remote certificate is not valid for hostname

I'm using a CrawlSpider with LinkExtractor object to crawl next pages and other links from a homepage. Iv'e got two Links Extractors; one to crawl next pages and another one to crawl some links events (cf. spider code below).

My second linkExtractor works (events links), but the first one doesn't.
I've got this error in my stack trace when I launched my spider :

[scrapy] WARNING: Remote certificate is not valid for hostname "marathons.ahotu.fr"; u'ssl390453.cloudflaressl.com'!=u'marathons.ahotu.fr'

Actually I'm a novice in Python and Scrapy, so my questions are :

  • What does it mean ?
  • How can I fix it ?

Here is my spider code :

import scrapy
import os
import re
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.selector import Selector

if os.path.isfile('ListeCAP_Marathons_ahotu.csv'):
    reecritureFichier = open('ListeCAP_Marathons_ahotu.csv', 'w')
    reecritureFichier.truncate()
    reecritureFichier.close()

class MySpider(CrawlSpider):
    name = 'ListeCAP_Marathons_ahotu'
    start_urls = ['https://marathons.ahotu.fr/calendrier']

    rules = (
        # LINKEXTRACTOR N°1 = NEXT PAGES
        Rule(LinkExtractor(allow=('https://marathons.ahotu.fr/calendrier?page=[0-9]{1,100}#list-top',),),),

        # LINKEXTRACTOR N°2 = EVENTS LINKS
        Rule(LinkExtractor(allow=('https://marathons.ahotu.fr/evenement/.+',),),follow=True,callback='parse_item'),      
    )     

    def parse_item(self, response):  
        selector = Selector(response)
        yield{
            'nom_even':selector.xpath('/html/body/div[2]/div[2]/h1/span[@itemprop="name"]/text()').extract(),
    }    

        print('--------------------> NOM DE L\'EVENEMENT :', selector.xpath('//*[@id="jog"]/div[2]/section/article/header/h1/text()').extract())

(I'm using Scrapy 1.4.0 with Twisted-17.9.0)

You can't fix this type of error. The best that you can do is send a message to the administrator of the domain and let he/she know that the certificate has problems (In this case the certificate is for other domain and not marathons.arotu.fr).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM