I am studying the Scrapy tutorial. To test the process I created a new project with these files:
See my post in Scrapy group for links to scripts, I cannot post more than 1 link here.
The spider runs well and scrapes the text between title tags and puts it in FirmItem
[whitecase.com] INFO: Passed FirmItem(title=[u'White & Case LLP - Lawyers - Rachel B. Wagner '])
But I am stuck in the pipeline process. I want to add this FirmItem into a csv file so that I can add it to the database.
I am new to python and I am learning as I go along. I would appreciate if someone gave me a clue about how to make the pipelines.py work so that the scraped data is put into items.csv.
Thank you.
I think they address your specific question in the Scrapy Tutorial .
It suggest, as others have here using the CSV module. Place the following in your pipelines.py
file.
import csv
class CsvWriterPipeline(object):
def __init__(self):
self.csvwriter = csv.writer(open('items.csv', 'wb'))
def process_item(self, domain, item):
self.csvwriter.writerow([item['title'][0], item['link'][0], item['desc'][0]])
return item
Don't forget to enable the pipeline by adding it to the ITEM_PIPELINES setting in your settings.py, like this:
ITEM_PIPELINES = ['dmoz.pipelines.CsvWriterPipeline']
Adjust to suit the specifics of your project.
使用内置CSV Feed导出 (在v0.10中可用)和CsvItemExporter 。
Python has a module for reading/writing CSV files , this is safer than writing the output yourself (and getting all quoting/escaping right...)
import csv
csvfile = csv.writer(open('items.csv', 'w'))
csvfile.writerow([ firmitem.title, firmitem.url ])
csvfile.close()
Open file and write to it.
f = open('my.cvs','w')
f.write('h1\th2\th3\n')
f.write(my_class.v1+'\t'+my_class.v2+'\t'+my_class.v3+'\n')
f.close()
Or output your results on stdout and then redirect stdout to file ./my_script.py >> res.txt
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.