简体   繁体   中英

Simple scrapy program running successfully on shell but not exporting data to csv

I have been trying to scrape data from the particular link only the comments,but when I run it on the shell it run succesffuly but when I am trying to export it to the csv file,I only get the comment_user not the comment_data why??

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.selector import Selector
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from urlparse import urljoin
from commen.items import CommenItem

class criticspider(CrawlSpider):
    name ="delh"
    allowed_domains =["consumercomplaints.in"]
    #start_urls =["http://www.consumercomplaints.in/?search=delhivery&page=2","http://www.consumercomplaints.in/?search=delhivery&page=3","http://www.consumercomplaints.in/?search=delhivery&page=4","http://www.consumercomplaints.in/?search=delhivery&page=5","http://www.consumercomplaints.in/?search=delhivery&page=6","http://www.consumercomplaints.in/?search=delhivery&page=7","http://www.consumercomplaints.in/?search=delhivery&page=8","http://www.consumercomplaints.in/?search=delhivery&page=9","http://www.consumercomplaints.in/?search=delhivery&page=10","http://www.consumercomplaints.in/?search=delhivery&page=11"]
    start_urls=["http://www.consumercomplaints.in/movement-delivery/delhivery-courier-service-c783976"]

    def parse(self,response):

        sites = response.xpath('//table[@style="width:100%"]')
        items = []

        for site in sites:
            item = CommenItem()
            item['comment_user'] = site.xpath('.//td[@class="comments"]/div[1]/a/text()').extract()
            item['comment_data'] = site.xpath('.//tr[3]/td/div/text()').extract()
            items.append(item)
        return items

The logic implemented in the parse() method is a bit incorrect. I'd go this way:

def parse(self,response):
    sites = response.xpath('//td/div[starts-with(@id, "c")]')
    for site in sites:
        item = CommenItem()
        item['comment_user'] = site.xpath('.//td[@class="comments"]/div[1]/a/text()').extract()[0].strip()
        item['comment_data'] = ''.join(site.xpath('.//td[@class="compl-text"]/div//text()').extract()).strip()
        yield item

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM