scrapy run spider from path

Question

A number of suggestions on running scrapy suggest doing this in order to start scrapy via script, or to debug in an IDE, etc:

from scrapy import cmdline

cmdline.execute(("scrapy runspider spider-file-name.py").split())

This works, so long as the script is placed in the project directory, but if not try to give it an absolute or relative path. For example:

import os

from scrapy import cmdline

this_file_path = os.path.dirname(os.path.realpath(__file__))
base_path = this_file_path.replace('bootstrap', '')
full_path = base_path + "path/to/spiders/some-spider.py"
print full_path

cmdline.execute(("scrapy runspider " + full_path).split())

With this, I get:

2016-09-28 10:49:29 [scrapy] INFO: Scrapy 1.1.2 started (bot: scrapybot)
2016-09-28 10:49:29 [scrapy] INFO: Overridden settings: {}
Usage
=====
  scrapy runspider [options] <spider_file>

spider-main.py: error: Unable to load '/Users/name/intellij-workspace/crawling/scrape/scrape/spiders/some-spider.py': No module named items

Is there a way to run and debug scrapy spiders from an absolute path? Ideally, I need to have this to debug in an IDE.

Answer 1

It's highly advised to use a distributed crawling software but if you really want to do it like this just for some dirty testing here it is

import subprocess

project_path="/Users/name/intellij-workspace/crawling/scrape"
subprocess.Popen(["scrapy","runspider","scrape/spiders/some-spider.py"],cwd=project_path)

scrapy run spider from path

Question

1 answers

solution1
3 2016-09-28 19:42:55

scrapy run spider from path

Question

1 answers

solution1 3 2016-09-28 19:42:55

solution1
3 2016-09-28 19:42:55