Using Scrapy to scrape data

Question

I am trying to scrape data using scrapy. But having trouble in editing the code. Here is what I have done as an experiment:

import scrapy

class BlogSpider(scrapy.Spider):
    name = 'blogspider'
    start_urls = ['http://anon.example.com/']

    def parse(self, response):
        for title in response.css('h2'):
            yield {'Agent-name': title.css('a ::text').extract_first()}

        next_page = response.css('li.col-md-3 ln-t > div.cs-team team-grid > figure > a ::attr(href)').extract_first()
        if next_page:
            yield scrapy.Request(response.urljoin(next_page), callback=self.parse)

I have used the example from website scrapy.org and try to modify it. What this code is doing is extracting the names of all the agents from the given page.
But I want that scrapy should go inside the page of the agent and extract its information from there.
Say for example:

Name: name of the agent
Phone: Phone Number
Email: email address
website: URL of website .. etc

Hope this clarifies my problem. I would like to have a solution for this problem.

Answer 1

import scrapy

class BlogSpider(scrapy.Spider):
    name = 'blogspider'
    start_urls = ['http://anon.example.com']


    # get 502 url of name
    def parse(self, response):
        info_urls = response.xpath('//div[@class="text"]//a/@href').extract()
        for info_url in info_urls:
            yield scrapy.Request(url=info_url, callback=self.parse_inof)
    # visit each url and get info
    def parse_inof(self, response):
        info = {}
        info['name'] = response.xpath('//h2/text()').extract_first()
        info['phone'] = response.xpath('//text()[contains(.,"Phone:")]').extract_first()
        info['email'] = response.xpath('//*[@class="cs-user-info"]/li[1]/text()').extract_first()
        info['website'] = response.xpath('//*[@class="cs-user-info"]/li[2]/a/text()').extract_first()
        print(info)

The name can be found in the detail page, so in first step, we just collect all the detail url.

Then we visit all the url and get all the info.

The date may need clean-up, but the idea is clear.

Using Scrapy to scrape data

Question

1 answers

solution1
1 ACCPTED 2017-01-30 13:52:05

Using Scrapy to scrape data

Question

1 answers

solution1 1 ACCPTED 2017-01-30 13:52:05

solution1
1 ACCPTED 2017-01-30 13:52:05