將scrapy數據打印到csv

Question

嗨，我最近開始使用scrapy，並寫了一個爬蟲。 但是當將數據輸出到 csv 時，它們都打印在一行中。 如何將每個數據打印到自己的行？

我的情況是從網站打印鏈接。 以 json 格式打印時效果很好。

這是代碼。

items.py 文件。

import scrapy
from scrapy.item import Item ,Field
class ErcessassignmentItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
link = Field()
#pass

mycrawler.py

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector # deprecated
from scrapy.selector import Selector
from ercessAssignment.items import ErcessassignmentItem

class MySpider(BaseSpider):
name ="ercessSpider"
allowed_domains =["site_url"]
start_urls = ["site_url"]

def parse(self, response):
    hxs = Selector(response)
    links = hxs.xpath("//p")
    items = []
    for linkk in links:
        item = ErcessassignmentItem()
        item["link"] = linkk.xpath("//a/@href").extract()
        items.append(item)
        return items`

Answer 1

你應該在代碼中有適當的縮進

import scrapy
from scrapy.item import Item ,Field
class ErcessassignmentItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    link = Field()

然后在你的蜘蛛中，不要使用return ，你的 for 循環只會運行一次，你只會在 CSV 中打印 1 行，而是使用yield其次，將項目放入 CSV 的代碼在哪里？ 我猜你正在使用scrapy的默認存儲項目的方式，如果你不知道，請像這樣運行你的scraper

scrapy crawl ercessSpider -o my_output.csv

你的蜘蛛代碼應該是這樣的，注意我所做的更改

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector # deprecated
from scrapy.selector import Selector
from ercessAssignment.items import ErcessassignmentItem

class MySpider(BaseSpider):
name ="ercessSpider"
allowed_domains =["site_url"]
start_urls = ["site_url"]

def parse(self, response):
    hxs = Selector(response)
    links = hxs.xpath("//p")
    for linkk in links:
        item = ErcessassignmentItem()
        item["link"] = linkk.xpath("//a/@href").extract()
        yield item

Answer 2

for linkk in links:
    item = ErcessassignmentItem()
    item["link"] = xpath("//a/@href").extract()[linkk]
    yield item

這在 css 選擇器中效果很好，但如果以上兩種解決方案都不起作用，那么你可以試試這個。

Answer 3

您上面的代碼不print任何內容。 此外，我沒有看到任何.csv部分。 此外，您在parse()中創建的items列表永遠不會超過 1，因為對我來說看起來像是縮進錯誤（即您在for-loop的第一次迭代后return 。為了更好的可讀性，您可以使用for /else在這里構造：

def parse(self, response):
    hxs = Selector(response)
    links = hxs.xpath("//p")
    items = []
    for linkk in links:
        item = ErcessassignmentItem()
        item["link"] = linkk.xpath("//a/@href").extract()
        items.append(item)
    else:                               # after for loop is finished
        # either return items
        # or print link in items here without returning
        for link in items:              # take one link after another
            print link                  # and print it in one line each

將scrapy數據打印到csv

問題描述

3 個解決方案

解決方案1
1 2018-10-29 08:32:57

解決方案2
1 2020-07-05 07:12:07

解決方案3
0 2018-10-29 08:09:57

將scrapy數據打印到csv

問題描述

3 個解決方案

解決方案1 1 2018-10-29 08:32:57

解決方案2 1 2020-07-05 07:12:07

解決方案3 0 2018-10-29 08:09:57

解決方案1
1 2018-10-29 08:32:57

解決方案2
1 2020-07-05 07:12:07

解決方案3
0 2018-10-29 08:09:57