簡體   English   中英

無法使用抓取框架 307 重定向錯誤抓取 myntra API 數據

[英]unable to scrape myntra API data using scrapy framework 307 redirect error

下面是蜘蛛代碼:

import scrapy
class MyntraSpider(scrapy.Spider):

    custom_settings = {
        'HTTPCACHE_ENABLED': False,
        'dont_redirect': True,
        #'handle_httpstatus_list' : [302,307],
        #'CRAWLERA_ENABLED': False,
        'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36',
    }


    name = "heytest"
    allowed_domains = ["www.myntra.com"]
    start_urls = ["https://www.myntra.com/web/v2/search/data/duke"]
    def parse(self, response):
        self.logger.debug('Parsed jabong.com')

“解析的jabong.com”沒有被記錄。 實際上,回調方法(解析)沒有被調用。 請回。

請從抓取中心找到錯誤日志:

另見郵遞員截圖

我運行此代碼(僅運行幾次)並且獲取數據沒有問題。

它看起來與您的代碼相似,所以我不知道您為什么遇到問題。

也許他們出於某種原因阻止了你。

#!/usr/bin/env python3

import scrapy
import json

class MySpider(scrapy.Spider):

    name = 'myspider'

    allowed_domains = ['www.myntra.com']

    start_urls = ['https://www.myntra.com/web/v2/search/data/duke']

    #def start_requests(self):
    #    for tag in self.tags:
    #        for page in range(self.pages):
    #            url = self.url_template.format(tag, page)
    #            yield scrapy.Request(url)

    def parse(self, response):
        print('url:', response.url)

        #print(response.body)

        data = json.loads(response.body)

        print('data.keys():', data.keys())

        print('meta:', data['meta'])

        print("data['data']:", data['data'].keys())

        # download files
        #for href in response.css('img::attr(href)').extract():
        #   url = response.urljoin(src)
        #   yield {'file_urls': [url]}

        # download images and convert to JPG
        #for src in response.css('img::attr(src)').extract():
        #   url = response.urljoin(src)
        #   yield {'image_urls': [url]}

# --- it runs without project and saves in `output.csv` ---

from scrapy.crawler import CrawlerProcess

c = CrawlerProcess({
    'USER_AGENT': 'Mozilla/5.0',
    #'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36',

    # save in CSV or JSON
    'FEED_FORMAT': 'csv',     # 'json
    'FEED_URI': 'output.csv', # 'output.json

    # download files to `FILES_STORE/full`
    # it needs `yield {'file_urls': [url]}` in `parse()`
    #'ITEM_PIPELINES': {'scrapy.pipelines.files.FilesPipeline': 1},
    #'FILES_STORE': '/path/to/valid/dir',

    # download images and convert to JPG
    # it needs `yield {'image_urls': [url]}` in `parse()`
    #'ITEM_PIPELINES': {'scrapy.pipelines.files.ImagesPipeline': 1},
    #'IMAGES_STORE': '/path/to/valid/dir',

    #'HTTPCACHE_ENABLED': False,
    #'dont_redirect': True,
    #'handle_httpstatus_list' : [302,307],
    #'CRAWLERA_ENABLED': False,
})
c.crawl(MySpider)
c.start()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM