簡體   English   中英

我如何刮到csv中的csv

[英]How do I scrape to csv in scrapy

如何將網頁抓取到csv? 我的csv沒有出現或顯示為空白

我已經跑了: scrapy crawl jobs -o output.csv 出現csv時,什么都沒有出現。

# -*- coding: utf-8 -*-
import scrapy


from scrapy import cmdline
cmdline.execute("scrapy crawl jobs".split())

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from craigslist_sample.items import CraigslistSampleItem

class MySpider(CrawlSpider):
    name = "jobs"
    allowed_domains = ["sfbay.craigslist.org"]
    start_urls = ["http://sfbay.craigslist.org/search/npo"]

    rules = (
        Rule(SgmlLinkExtractor(allow=(), restrict_xpaths=('//a[@class="button next"]',)), callback="parse_items", follow= True),
    )

    def parse_items(self, response):
        hxs = HtmlXPathSelector(response)
        titles = hxs.xpath('//span[@class="pl"]')
        items = []
        for titles in titles:
            item = CraigslistSampleItem()
            item["title"] = titles.xpath("a/text()").extract()
            item["link"] = titles.xpath("a/@href").extract()
            items.append(item)
        return(items)

    class MySpider(CrawlSpider):
        name = 'csvexample'
        start_urls = ['C:/example.csv']
        delimiter = ','
        headers = ['Address', 'Website']

嘗試一下-我認為您必須分別導出每個項目。 您每次都在創建項目類的特殊實例,而從不實際返回項目。 您將項目追加到列表中,然后返回列表,因此它永遠不會穿過項目管道。 同樣在標題列表中,您說的是標題均為復數的標題。

    # -*- coding: utf-8 -*-
import scrapy


from scrapy import cmdline
# cmdline.execute("scrapy crawl jobs".split()) -- Not sure what this line achieves?

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from craigslist_sample.items import CraigslistSampleItem

class MySpider(CrawlSpider):
    name = "jobs"
    allowed_domains = ["sfbay.craigslist.org"]
    start_urls = ["http://sfbay.craigslist.org/search/npo"]
    rules = (
    Rule(SgmlLinkExtractor(allow=(), restrict_xpaths=('//a[@class="button next"]',)), callback="parse_items", follow= True),
    )

    def parse_items(self, response):
        hxs = HtmlXPathSelector(response)
        titles = hxs.xpath('//span[@class="pl"]')
        for title in titles:
            item = CraigslistSampleItem()
            item["title"] = title.xpath("a/text()").extract_first()
            item["link"] = title.xpath("a/@href").extract_first()
            yield item

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM