[英]export scrapy to text file
Is there a way to export scrapy data to a text file so that when the python script runs it generates a text file without having to go through the terminal to execute scrapy?
代碼示例
class NameListSpider(CrawlSpider):
name = 'namelist'
allowed_domains = ['namelist.com']
start_urls = ['http://www.namelist.com']
rules = (
Rule(LinkExtractor(restrict_xpaths='//div[@class="post-outer"]/a'), callback='parse_item', follow=True),
)
def parse_item(self, response):
yield {
'name': response.xpath('//div[@class="alt"]/span/span[2]/text()').get()
}
# have added the below as an example
with open("file.txt", "a") as file:
file.write(name)
實現此結果的方法不止一種。
如果您想使用scrapy crawl
運行您的項目,您可以在 settings 中配置提要。
如果您想使用python your_python_script.py
運行它,您還需要傳遞設置。
您甚至可以將不同的項目導出到不同的文件。 為此,請在 github 上查看此管道
現在使用python your_script.py
運行你的蜘蛛,你會做這樣的事情:
# -*- coding: utf-8 -*-
from scrapy.settings import Settings
from scrapy.crawler import CrawlerRunner
from twisted.internet import reactor
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import Rule, CrawlSpider
class NameListSpider(CrawlSpider):
name = 'namelist'
allowed_domains = ['namelist.com']
start_urls = ['http://www.namelist.com']
rules = (
Rule(LinkExtractor(restrict_xpaths='//div[@class="post-outer"]/a'), callback='parse_item', follow=True),
)
def parse_item(self, response):
yield {
'name': response.xpath('//div[@class="alt"]/span/span[2]/text()').get()
}
def get_settings():
settings = Settings()
settings.set('FEED_URI', 'file.txt')
settings.set('FEED_FORMAT', 'csv')
return settings
if __name__ == '__main__':
settings = get_settings()
runner = CrawlerRunner(settings)
d = runner.crawl(NameListSpider)
d.addBoth(lambda _: reactor.stop())
reactor.run()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.