简体   繁体   English

新手Q关于Scrapy pipeline.py

[英]Newbie Q about Scrapy pipeline.py

I am studying the Scrapy tutorial. 我正在研究Scrapy教程。 To test the process I created a new project with these files: 为了测试这个过程,我用这些文件创建了一个新项目:

See my post in Scrapy group for links to scripts, I cannot post more than 1 link here. 请参阅我在Scrapy组中的帖子获取脚本链接,我不能在这里发布超过1个链接。

The spider runs well and scrapes the text between title tags and puts it in FirmItem 蜘蛛运行良好并在标题标签之间擦除文本并将其放入FirmItem

[whitecase.com] INFO: Passed FirmItem(title=[u'White & Case LLP - Lawyers - Rachel B. Wagner ']) 

But I am stuck in the pipeline process. 但我陷入了管道流程。 I want to add this FirmItem into a csv file so that I can add it to the database. 我想将此FirmItem添加到csv文件中,以便我可以将其添加到数据库中。

I am new to python and I am learning as I go along. 我是python的新手,我正在学习。 I would appreciate if someone gave me a clue about how to make the pipelines.py work so that the scraped data is put into items.csv. 如果有人给我一个关于如何使pipelines.py工作的线索,以便将已删除的数据放入items.csv,我将不胜感激。

Thank you. 谢谢。

I think they address your specific question in the Scrapy Tutorial . 我认为它们可以在Scrapy教程中解决您的具体问题。

It suggest, as others have here using the CSV module. 它建议,正如其他人在这里使用CSV模块一样。 Place the following in your pipelines.py file. 将以下内容放在pipelines.py文件中。

import csv

class CsvWriterPipeline(object):

    def __init__(self):
        self.csvwriter = csv.writer(open('items.csv', 'wb'))

    def process_item(self, domain, item):
        self.csvwriter.writerow([item['title'][0], item['link'][0], item['desc'][0]])
        return item

Don't forget to enable the pipeline by adding it to the ITEM_PIPELINES setting in your settings.py, like this: 不要忘记通过将管道添加到settings.py中的ITEM_PIPELINES设置来启用管道,如下所示:

ITEM_PIPELINES = ['dmoz.pipelines.CsvWriterPipeline']

Adjust to suit the specifics of your project. 根据项目的具体情况进行调整。

使用内置CSV Feed导出 (在v0.10中可用)和CsvItemExporter

Python has a module for reading/writing CSV files , this is safer than writing the output yourself (and getting all quoting/escaping right...) Python有一个用于读/写CSV文件模块 ,这比自己编写输出更安全(并使所有引用/转义正确...)

import csv
csvfile = csv.writer(open('items.csv', 'w'))
csvfile.writerow([ firmitem.title, firmitem.url ])
csvfile.close()

Open file and write to it. 打开文件并写入。

f = open('my.cvs','w')
f.write('h1\th2\th3\n')
f.write(my_class.v1+'\t'+my_class.v2+'\t'+my_class.v3+'\n')
f.close()

Or output your results on stdout and then redirect stdout to file ./my_script.py >> res.txt 或者在stdout上输出结果,然后将stdout重定向到文件./my_script.py >> res.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM