I'm new with python and scrapy. I'm tring to follow the Scrapy tutorial but I don't understand the logic of the storage step .
scrapy crawl spidername -o items.json -t json
scrapy crawl spidername --set FEED_URI=output.csv --set FEED_FORMAT=csv
I dont understand the signification of :
Thank you for your help
You can view a list of available commands by typing scrapy crawl -h
from within your project directory.
scrapy crawl spidername -o items.json -t json
-o
specifies the output filename for dumped items (items.json) -t
specifies the format for dumping items (json) scrapy crawl spidername --set FEED_URI=output.csv --set FEED_FORMAT=csv
--set
is used to set/override a setting FEED_URI
is used to set the storage backend for the item dumping. In this instance it is set to "output.csv" which is using the local filesystem ie a simple output file.(for current example - output.csv)FEED_FORMAT
is used to set the serialization format for the (output) feed ie (for current example csv) References (Scrapy documentation):
--set
Arguments provided by the command line are the ones that take precedence, overriding any other options.
You can explicitly override one (or more) settings using the -s (or --set) command line option.
Example:
scrapy crawl myspider -s LOG_FILE=scrapy.log
sets the LOG_FILE settings value to `scrapy.log`
-o
Specifies the output filename and extension WHERE you will write the scraped data to
Examples:
scrapy crawl quotes -o items.csv
scrapy crawl quotes -o items.json
scrapy crawl quotes -o items.xml
-t
Specifies the serialisation format or HOW the items are written
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.