简体   繁体   中英

Python: Scrapy CSV exports incorrectly?

I am simply trying to write to a csv. However I have two separate for-statements, therefore the data from each for-statement exports independently and breaks order. Suggestions?

def parse(self, response):
        hxs = HtmlXPathSelector(response)
        titles = hxs.select('//td[@class="title"]')
        subtext = hxs.select('//td[@class="subtext"]')
        items = []
        for title in titles:
            item = HackernewsItem()
            item["title"] = title.select("a/text()").extract()
            item["url"] = title.select("a/@href").extract()
            items.append(item)
        for score in subtext:
            item = HackernewsItem()
            item["score"] = score.select("span/text()").extract()
            items.append(item)
        return items

As is apparent in the image below, the second for-statement prints below the others instead of "among" others as header does.

CSV image attached: CSV文件

and github link for full file: https://github.com/nchlswtsn/scrapy/blob/master/items.csv

Your order of exporting element is logical to what you find in CSV file, first you exported all the titles then all subtext elements.
I guess you are trying to scrap HN articles, here is my suggestion:

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    titles = hxs.select('//td[@class="title"]')
    items = []
    for title in titles:
        item = HackernewsItem()
        item["title"] = title.select("a/text()").extract()
        item["url"] = title.select("a/@href").extract()
        item["score"] = title.select('../td[@class="subtext"]/span/text()').extract()
        items.append(item)
    return items

I didn't test it, but it will give you an idea.

The CSV module from Python 2.7 does not support Unicode, so it's suggested to use unicodecsv instead.

$pip install unicodecsv

The unicodecsv is a drop-in replacement for Python 2's csv module which supports unicode strings without a hassle.

And then use this instead of import csv

import unicodecsv as csv

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM