How do Scrapy's feed exports work when writing to the local filesystem?

Question

I currently have a long running Python script in a screen session on an AWS EC2 instance that executes commands like

from subprocess import call 

call('''scrapy crawl my_spider -a year=2005 -a month=may 
--set FEED_URI=/home/ubuntu/my_spider/data/2005_may.json 
--set FEED_FORMAT=jsonlines''', shell=True)

over the set of all possible combinations of year, month for years 2000-2017 and months October-June. Many of the individual commands have completed, and I can reattach to the screen session and see that it's scraping data properly, but I see no files in /home/ubuntu/my_spider/data .

Will the files appear after the Python script completes, or should I stop it now because something is wrong?

Answer 1

当抓取工具启动Spider时， FileFeedStorage打开文件，因此，如果启动后未出现输出文件，则显然出了问题。

Answer 2

Strictly speaking, this doesn't answer the original question, but it still deserves mention. The issue turned out to be that call was not parsing the FEED_URI and FEED_FORMAT options correctly, and thus was not writing the scraped data to the specified file. Why this wasn't propagated back in some way, I don't know. Changing it to

call(["scrapy", "crawl", "my_spider", 
  "-a", "year=2005", 
  "-a", "month=may", 
  "--set",  "FEED_URI=/home/ubuntu/my_spider/data/2005_may.json",
  "--set", "FEED_FORMAT=jsonlines"], cwd="/home/ubuntu/my_spider/")

worked, but it should be said that this is not suggested practice for running Scrapy from a script.

How do Scrapy's feed exports work when writing to the local filesystem?

Question

2 answers

solution1
1 ACCPTED 2017-01-28 20:40:57

solution2
1 2017-01-29 00:09:00

How do Scrapy's feed exports work when writing to the local filesystem?

Question

2 answers

solution1 1 ACCPTED 2017-01-28 20:40:57

solution2 1 2017-01-29 00:09:00

solution1
1 ACCPTED 2017-01-28 20:40:57

solution2
1 2017-01-29 00:09:00