简体   繁体   中英

Printing scrapy data to csv

Hi I started scrapy recently,and wrote a crawler. But when outputting the data to csv,they are all printed in a single row. How can print each data to its own row?

I my case am printing links from a website. It works well when printed in json format.

Here's the code.

The items.py file.

import scrapy
from scrapy.item import Item ,Field
class ErcessassignmentItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
link = Field()
#pass

The mycrawler.py

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector # deprecated
from scrapy.selector import Selector
from ercessAssignment.items import ErcessassignmentItem

class MySpider(BaseSpider):
name ="ercessSpider"
allowed_domains =["site_url"]
start_urls = ["site_url"]

def parse(self, response):
    hxs = Selector(response)
    links = hxs.xpath("//p")
    items = []
    for linkk in links:
        item = ErcessassignmentItem()
        item["link"] = linkk.xpath("//a/@href").extract()
        items.append(item)
        return items`

You should have proper indentation in code

import scrapy
from scrapy.item import Item ,Field
class ErcessassignmentItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    link = Field()

Then in your spider, do not use return , your for loop will run only once and you will only have 1 row printed in CSV, instead use yield Second, where is your code to put items into CSV? I guess you are using scrapy's default way of storing items, in case you already do not know, please run your scraper like

scrapy crawl ercessSpider -o my_output.csv

Your spider code should be like this, notice changes I made

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector # deprecated
from scrapy.selector import Selector
from ercessAssignment.items import ErcessassignmentItem

class MySpider(BaseSpider):
name ="ercessSpider"
allowed_domains =["site_url"]
start_urls = ["site_url"]

def parse(self, response):
    hxs = Selector(response)
    links = hxs.xpath("//p")
    for linkk in links:
        item = ErcessassignmentItem()
        item["link"] = linkk.xpath("//a/@href").extract()
        yield item
for linkk in links:
    item = ErcessassignmentItem()
    item["link"] = xpath("//a/@href").extract()[linkk]
    yield item

this works good in css selector but if above two solutions are not working then you can try this.

Your code above does not print anything. Moreover, I don't see any .csv part. Also, your items list created in parse() will never be longer than 1 due to something that looks like an indentation error to me (ie you return after the first iteration of the for-loop . For better readability, you could use the for/else construct here:

def parse(self, response):
    hxs = Selector(response)
    links = hxs.xpath("//p")
    items = []
    for linkk in links:
        item = ErcessassignmentItem()
        item["link"] = linkk.xpath("//a/@href").extract()
        items.append(item)
    else:                               # after for loop is finished
        # either return items
        # or print link in items here without returning
        for link in items:              # take one link after another
            print link                  # and print it in one line each

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM