[英]scrapy run spider from path
许多关于运行scrapy的建议建议这样做是为了通过脚本启动scrapy,或在IDE中进行调试等:
from scrapy import cmdline
cmdline.execute(("scrapy runspider spider-file-name.py").split())
只要将脚本放置在项目目录中,就可以工作,但如果没有,请尝试为其提供绝对或相对路径。 例如:
import os
from scrapy import cmdline
this_file_path = os.path.dirname(os.path.realpath(__file__))
base_path = this_file_path.replace('bootstrap', '')
full_path = base_path + "path/to/spiders/some-spider.py"
print full_path
cmdline.execute(("scrapy runspider " + full_path).split())
有了这个,我得到:
2016-09-28 10:49:29 [scrapy] INFO: Scrapy 1.1.2 started (bot: scrapybot)
2016-09-28 10:49:29 [scrapy] INFO: Overridden settings: {}
Usage
=====
scrapy runspider [options] <spider_file>
spider-main.py: error: Unable to load '/Users/name/intellij-workspace/crawling/scrape/scrape/spiders/some-spider.py': No module named items
有没有办法从绝对路径运行和调试scrapy蜘蛛? 理想情况下,我需要在 IDE 中进行调试。
强烈建议使用分布式爬虫软件,但如果你真的想这样做只是为了一些肮脏的测试,这里是
import subprocess
project_path="/Users/name/intellij-workspace/crawling/scrape"
subprocess.Popen(["scrapy","runspider","scrape/spiders/some-spider.py"],cwd=project_path)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.