简体   繁体   English

Python Scrapy -> 使用 scrapy 蜘蛛作为 ZC1C425268E68385D14AB5074C17A

[英]Python Scrapy -> Use a scrapy spider as a function

so I have the following Scrapy Spider in spiders.py所以我在spiders.py中有以下 Scrapy Spider

import scrapy 

class TwitchSpider(scrapy.Spider):
  name = "clips"

  def start_requests(self):
      urls = [
          f'https://www.twitch.tv/wilbursoot/clips?filter=clips&range=7d'
      ]

  def parse(self, response): 
    for clip in response.css('.tw-tower'):
      yield {
        'title': clip.css('::text').get()
      }

But the key aspect is that I want to call this spider as a function, in another file, instead of using scrapy crawl quotes in the console.但关键方面是我想在另一个文件中将此蜘蛛称为 function,而不是在控制台中使用scrapy crawl quotes Where can I read more on this, or whether this is possible at all?我在哪里可以阅读更多关于此的信息,或者这是否可能? I checked through the Scrapy documentation, but I didn't find much我检查了 Scrapy 文档,但没有找到太多

I'm kind of a beginner level developer but maybe you could try making the entire thing a function and then import that.我是一个初学者级别的开发人员,但也许你可以尝试将整个东西制作成 function 然后导入它。

Put your other file in the same directory as your spider file.将您的其他文件与您的蜘蛛文件放在同一目录中。 Then import the spider file like然后像这样导入蜘蛛文件

import spider

Then you will have access to the spider file and can make a spider object.然后您将可以访问蜘蛛文件并可以制作蜘蛛 object。

spi = spider()

Then can call functions on that object such as然后可以在 object 上调用函数,例如

spi.parse()

This article shows you how to import other python files classes and functions https://csatlas.com/python-import-file-module/本文介绍如何导入其他 python 文件类和函数https://csatlas.com/python-import-file-module/

Run the spider from main.py:从 main.py 运行蜘蛛:

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

if __name__ == "__main__":
    spider = 'TwitchSpider'
    settings = get_project_settings()
    # change/update settings:
    settings['USER_AGENT'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
    process = CrawlerProcess(settings)
    process.crawl(spider)
    process.start()

Run scrapy from a script . 从脚本运行 scrapy

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM