簡體   English   中英

將 scrapy 導出到文本文件

[英]export scrapy to text file

Is there a way to export scrapy data to a text file so that when the python script runs it generates a text file without having to go through the terminal to execute scrapy?

代碼示例

class NameListSpider(CrawlSpider):
    name = 'namelist'
    allowed_domains = ['namelist.com']
    start_urls = ['http://www.namelist.com']

    rules = (
        Rule(LinkExtractor(restrict_xpaths='//div[@class="post-outer"]/a'), callback='parse_item', follow=True),
    )

    def parse_item(self, response):
        yield {
            'name': response.xpath('//div[@class="alt"]/span/span[2]/text()').get()
        }

# have added the below as an example
with open("file.txt", "a") as file: 
    file.write(name)

實現此結果的方法不止一種。
如果您想使用scrapy crawl運行您的項目,您可以在 settings 中配置提要
如果您想使用python your_python_script.py運行它,您還需要傳遞設置。
您甚至可以將不同的項目導出到不同的文件。 為此,請在 github 上查看此管道

現在使用python your_script.py運行你的蜘蛛,你會做這樣的事情:

# -*- coding: utf-8 -*-
from scrapy.settings import Settings
from scrapy.crawler import CrawlerRunner
from twisted.internet import reactor
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import Rule, CrawlSpider

class NameListSpider(CrawlSpider):
    name = 'namelist'
    allowed_domains = ['namelist.com']
    start_urls = ['http://www.namelist.com']
    rules = (
        Rule(LinkExtractor(restrict_xpaths='//div[@class="post-outer"]/a'), callback='parse_item', follow=True),
    )

    def parse_item(self, response):
        yield {
            'name': response.xpath('//div[@class="alt"]/span/span[2]/text()').get()
        }

def get_settings():
    settings = Settings()
    settings.set('FEED_URI', 'file.txt')
    settings.set('FEED_FORMAT', 'csv')
    return settings

if __name__ == '__main__':
    settings = get_settings()
    runner = CrawlerRunner(settings)
    d = runner.crawl(NameListSpider)
    d.addBoth(lambda _: reactor.stop())
    reactor.run()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM