简体   繁体   English

Scrapy csv在多行输出

[英]Scrapy csv outputing on multiple lines

Here's my spider: 这是我的蜘蛛:

from scrapy.spider import BaseSpider
from scrapy.selector import Selector
from ..items import TutorialItem

class Tutorial1(BaseSpider):
name = "Tut"
allowed_domains = ['nytimes.com']
start_urls = ["http://nytimes.com",] 

def parse(self, response):
    sel = Selector(response)
    sites = sel.xpath('//div[@class="span-ab-layout layout"]')
    items = []

    for site in sites:
        item = TutorialItem()
        item['title'] = map(unicode.strip, site.select('//h2[@class="story-heading"]/a/text()').extract())
        item['time'] = map(unicode.strip, site.select('//time[@class="timestamp"]/text()').extract())
        yield item

Here is my output: 这是我的输出:

author time By PETER BAKER,By JONATHAN M. KATZ and RICHARD PÉREZ-PEÃ'A,By NEIL MacFARQUHAR,By RON NIXON,By RICHARD GOLDSTEIN,By LOUISE STORY and ALEJANDRA XANIC von BERTRAB,By DAVID CARR,By AO SCOTT,By JERÉ LONGMAN,By THE EDITORIAL BOARD,By JON BECKMANN,By CJ HUGHES,By JOANNE KAUFMAN 10:26 AM ET,1:08 PM ET,11:57 AM ET,8:33 AM ET,10:01 AM ET,12:35 PM ET,1:47 PM ET,10:36 AM ET,10:26 AM ET,9:49 AM ET,12:05 PM ET,9:21 AM ET,12:22 PM ET,11:52 AM ET,8:59 AM ET 作者时间作者:PETER BAKER,作者:JONATHAN M. KATZ和RICHARDPÉREZ-PEÃ'A,作者:NEIL MacFARQUHAR,作者:RON NIXON,作者:RICHARD GOLDSTEIN,作者:LOUISE STORY和ALEJANDRA XANIC von BERTRAB,作者:DAVID CARR,作者:AO SCOTT,作者:耶鲁·朗曼(JERÃLONGMAN),由编辑委员会,由乔恩·贝克曼(Jon Beckmann),由CJ休斯(CJ Hughes),由美国东部时间10:26 AM,美国东部时间1:08 PM,美国东部时间11:57 AM,美国东部时间8:33 AM,美国东部时间10:01 AM ,美国东部时间下午12:35,美国东部时间下午1:47,美国东部时间上午10:36,东部时间上午10:26,美国东部时间上午9:49,美国东部时间下午12:05,美国东部时间上午9:21,美国东部时间下午12:22,11美国东部时间上午52:8,美国东部时间上午8:59

By PETER BAKER,By JONATHAN M. KATZ and RICHARD PÉREZ-PEÃ'A,By NEIL MacFARQUHAR,By RON NIXON,By RICHARD GOLDSTEIN,By LOUISE STORY and ALEJANDRA XANIC von BERTRAB,By DAVID CARR,By AO SCOTT,By JERÉ LONGMAN,By THE EDITORIAL BOARD,By JON BECKMANN,By CJ HUGHES,By JOANNE KAUFMAN 10:26 AM ET,1:08 PM ET,11:57 AM ET,8:33 AM ET,10:01 AM ET,12:35 PM ET,1:47 PM ET,10:36 AM ET,10:26 AM ET,9:49 AM ET,12:05 PM ET,9:21 AM ET,12:22 PM ET,11:52 AM ET,8:59 AM ET 作者:PETER BAKER,作者:JONATHAN M. KATZ和RICHARDPÉREZ-PEÃ'A,作者:NEIL MacFARQUHAR,作者:RON NIXON,作者:Richard GOLDSTEIN,作者:LOUISE STORY和ALEJANDRA XANIC von BERTRAB,作者:DAVID CARR,作者:AO SCOTT,作者:JERà ‰LONGMAN,由编辑委员会,由JON BECKMANN,由CJ Hughes,由JOANNE KAUFMAN,由ET ET 10:26 AM ET,11:57 AM ET,8:33 AM ET,10:01 AM ET,12 :美国东部时间下午35点,美国东部时间下午1点47分,美国东部时间上午10点36分,美国东部时间上午10点26分,美国东部时间上午9点49分,美国东部时间下午12点05分,东部时间上午9点21分,美国东部时间下午12点22分美国东部时间上午8:59

I made the indention so it was clear where it was duplicating. 我做了缩进,所以很清楚它在哪里重复。

My problem occurs when I go to print out my work in CSV is always comes out in 1 giant row. 当我打印出我的CSV工作总是出现在一个巨大的行中时,我的问题出现了。 It also makes a duplicate column for some reason. 由于某种原因,它也使列重复。 Can anyone help me with this dilemma? 有人可以帮助我解决这个难题吗?

I was able to find it by experimenting with: 我通过实验来找到它:

hxs = HtmlXPathSelector(response)

Apparently, there is a huge difference between Selector and HtmlPatchSelector 显然,Selector和HtmlPatchSelector之间存在巨大差异

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM