简体   繁体   中英

No data after scraping a website

I want to scrape all names from a website https://www.internationaltelecomsweek.com using Scrapy.

This is in the item file.

import scrapy
class ItwItem(scrapy.Item):
    name = scrapy.Field()

This is my spider.

import scrapy
from itw.items import ItwItem
class ItwSpider(scrapy.Spider):

        name = 'itw'

        allowed_domains = ['https://www.internationaltelecomsweek.com']

        start_urls = ['https://www.internationaltelecomsweek.com/this-year/companies-attending']

        def parse(self, response):
        data= json.loads(response.body)
        for i in data:
        item["name"] = i["DisplayName"]
        return item

When I run scrapy I only get a blank csv file. What am I doing wrong?

It seems that the list of attendees is dynamically generated and each attendee returned as a JSON-object.

Load up the site in the scrapy shell and check view(response) to see what your spider can actually read. You'll see that the page returns nothing where in your browser you can see the attendees.

If you check the network-tab to see which requests are being passed to the server, you'll see that the list of attendees are being passed from this URL , each as a JSON-object.

What you'll have to do is: Parse the URL which yields the JSON-objects and process them using JSON in scrapy:

import json

data = json.loads(response.body)
for i in data:
    item["name"] = i["DisplayName"]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM