简体   繁体   中英

scrapy run spider from path

A number of suggestions on running scrapy suggest doing this in order to start scrapy via script, or to debug in an IDE, etc:

from scrapy import cmdline

cmdline.execute(("scrapy runspider spider-file-name.py").split())

This works, so long as the script is placed in the project directory, but if not try to give it an absolute or relative path. For example:

import os

from scrapy import cmdline

this_file_path = os.path.dirname(os.path.realpath(__file__))
base_path = this_file_path.replace('bootstrap', '')
full_path = base_path + "path/to/spiders/some-spider.py"
print full_path

cmdline.execute(("scrapy runspider " + full_path).split())

With this, I get:

2016-09-28 10:49:29 [scrapy] INFO: Scrapy 1.1.2 started (bot: scrapybot)
2016-09-28 10:49:29 [scrapy] INFO: Overridden settings: {}
Usage
=====
  scrapy runspider [options] <spider_file>

spider-main.py: error: Unable to load '/Users/name/intellij-workspace/crawling/scrape/scrape/spiders/some-spider.py': No module named items

Is there a way to run and debug scrapy spiders from an absolute path? Ideally, I need to have this to debug in an IDE.

It's highly advised to use a distributed crawling software but if you really want to do it like this just for some dirty testing here it is

import subprocess

project_path="/Users/name/intellij-workspace/crawling/scrape"
subprocess.Popen(["scrapy","runspider","scrape/spiders/some-spider.py"],cwd=project_path)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM