Xpath selector in python Scrapy

Question

Right now I am learning how to use Xpath to scrape websites in combination with python Scrapy. Right now I am stuck at the following:

I am looking at a dutch website http://www.ah.nl/producten/bakkerij/brood where I want to scrape the names of the products:

So eventually I want a csv file with the names of the articles of all these breads. If I inspect elements, I get to see where these names are defined:

I need to find the right XPath to extract "AH Tijgerbrood bruin heel". So what I thought I should do in my spider is the following:

import scrapy
from stack.items import DmozItem

class DmozSpider(scrapy.Spider):
    name = "ah"
    allowed_domains = ["ah.nl"]
    start_urls = ['http://www.ah.nl/producten/bakkerij/brood']
    def parse(self, response):
        for sel in response.xpath('//div[@class="product__description small-7 medium-12"]'):
            item = DmozItem()
            item['title'] = sel.xpath('h1/text()').extract()
            yield item

Now, if I crawl with this spider, I dont get any result. I have no clue what I am missing here.

Answer 1

You would have to use selenium for this task since all the elements are loaded in JavaScript:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("http://www.ah.nl/producten/bakkerij/brood")
#put an arbitrarily large number, you can tone it down, this is to allow the webpage to load
driver.implicitly_wait(40) 
elements = driver.find_elements_by_xpath('//*[local-name()= "div" and @class="product__description small-7 medium-12"]//*[local-name()="h1"]')
for elem in elements:
    print elem.text

Answer 2

title = response.xpath('//div[@class="product__description small-7 medium-12"]./h1/text').extract()[0]

Xpath selector in python Scrapy

Question

2 answers

solution1
1 2015-08-06 23:39:04

solution2
0 2016-02-08 12:30:00

Xpath selector in python Scrapy

Question

2 answers

solution1 1 2015-08-06 23:39:04

solution2 0 2016-02-08 12:30:00

solution1
1 2015-08-06 23:39:04

solution2
0 2016-02-08 12:30:00