简体   繁体   English

抓取网站后无数据

[英]No data after scraping a website

I want to scrape all names from a website https://www.internationaltelecomsweek.com using Scrapy. 我想使用Scrapy从网站https://www.internationaltelecomsweek.com抓取所有名称。

This is in the item file. 这在项目文件中。

import scrapy
class ItwItem(scrapy.Item):
    name = scrapy.Field()

This is my spider. 这是我的蜘蛛。

import scrapy
from itw.items import ItwItem
class ItwSpider(scrapy.Spider):

        name = 'itw'

        allowed_domains = ['https://www.internationaltelecomsweek.com']

        start_urls = ['https://www.internationaltelecomsweek.com/this-year/companies-attending']

        def parse(self, response):
        data= json.loads(response.body)
        for i in data:
        item["name"] = i["DisplayName"]
        return item

When I run scrapy I only get a blank csv file. 当我运行scrapy时,我只会得到一个空白的csv文件。 What am I doing wrong? 我究竟做错了什么?

It seems that the list of attendees is dynamically generated and each attendee returned as a JSON-object. 似乎与会者列表是动态生成的,并且每个与会者都以JSON对象的形式返回。

Load up the site in the scrapy shell and check view(response) to see what your spider can actually read. 将站点加载到scrapy shell并查看view(response)以查看您的蜘蛛实际可以读取的内容。 You'll see that the page returns nothing where in your browser you can see the attendees. 您会看到该页面未返回任何内容,在浏览器中您可以看到与会者的位置。

If you check the network-tab to see which requests are being passed to the server, you'll see that the list of attendees are being passed from this URL , each as a JSON-object. 如果检查网络选项卡以查看将哪些请求传递到服务器,则将看到从该URL传递与会者列表,每个URL作为JSON对象传递。

What you'll have to do is: Parse the URL which yields the JSON-objects and process them using JSON in scrapy: 您需要做的是:解析生成JSON对象的URL并在scrapy中使用JSON处理它们:

import json

data = json.loads(response.body)
for i in data:
    item["name"] = i["DisplayName"]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM