简体   繁体   中英

Empty list returning by xpath in scrapy

I am working on scrapy , i am trying to gather some data from a site ,

Spider Code

class NaaptolSpider(BaseSpider):
    name = "naaptol"
    domain_name = "www.naaptol.com"
    start_urls = ["http://www.naaptol.com/buy/mobile_phones/mobile_handsets.html"]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        cell_matter = hxs.select('//div[@class="gridInfo"]/div[@class="gridProduct gridProduct_special"]')
        items=[]
        for i in cell_matter:
               cell_names = i.select('//p[@class="proName"]/a/text()').extract()
               prices = i.select('//p[@class="values"]/strong/text()').extract()
               item = ExampleItem()
               item['cell_name'] = cell_names
               item['price'] = prices
               items.append(item) 
        return [FormRequest(url="http://www.naaptol.com/faces/jsp/search/searchResults.jsp",
            formdata={'type': 'cat_catlg',
            'catid': '27',
            'sb' : '9,8',
            'frm' : '1',
            'max' : '15',
            'req': 'ajax'
            },
            callback=self.parse_item
            )]

def parse_item(self, response):
     hxs = HtmlXPathSelector(response) 
     cell_matter = hxs.select('//div[@class="gridInfo"]/div[@class="gridProduct gridProduct_special"]')
     for i in cell_matter:
               cell_names = i.select('//p[@class="proName"]/a/text()').extract()
               prices = i.select('//p[@class="values"]/strong/text()').extract()
               print cell_names
               print prices 

Result:

2012-06-15 09:38:36+0530 [naaptol] DEBUG: Crawled (200) <POST http://www.naaptol.com/faces/jsp/search/searchResults.jsp> (referer: http://www.naaptol.com/buy/mobile_phones/mobile_handsets.html)
[]
[]

Actually i had posted the form to achieve the pagination which is in javascript

Here i am receiving the response from parse method in parse_item method, but when i used the xpath same as in parse method its returning an empty list as above, can anyone tell me why its returning an empty array, and whats wrong in my code.

Thanks in advance

The response is in JSON format:

{
  "prodList": [
    {
      "pid": "955492",
      "pnm": "Samsung Star 3 Duos",
      "mctid": "27",
      "pc": "5,650",
      "mrp": "6290",
      "pdc": "10",
      "pimg": "Samsung-Star-3-duos-1.jpg",
      "rt": "8",
      "prc": "1",
      "per": "Y",
      (...)
    },
    (...)
}

In order to parse it, you can use python's json module. An example of what you are trying to achieve is here: Empty list for hrefs to achieve pagination through JavaScript onclick functions .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM