抓取时在单个字典中分组相同的数据

Question

I am trying scrape country name, GDP and population from this website .我正在尝试从这个网站上抓取国家名称、GDP 和人口。 I am using Scrapy with Python 3.7 .我正在使用Scrapy和Python 3.7 。 The problem is I am getting all the country data in a dictionary, all the GDP data in a dictionary and all the population data in a dictionary.问题是我正在字典中获取所有国家数据，字典中的所有 GDP 数据以及字典中的所有人口数据。 But I want corresponding country data, GDP and population in a dictionary.但我想要字典中对应的国家数据、GDP 和人口。

Here is my code:这是我的代码：

import scrapy

class DebtByCountriesSpider(scrapy.Spider):
    name = 'debt_by_countries'
    allowed_domains = ['worldpopulationreview.com/countries/countries-by-national-debt']
    start_urls = ['https://worldpopulationreview.com/countries/countries-by-national-debt/']

    def parse(self, response):

        # countries = response.xpath("//td/a/text()").getall()

        countries = response.xpath("//tbody/tr/td/a/text()").getall()
        GDP = response.xpath("//tbody/tr/td[2]/text()").getall()
        population = response.xpath("//tbody/tr/td[3]/text()").getall()


        yield{
            "country_name": countries,
            "GDP": GDP,
            "population": population

        }

Here is the output of my code:这是我的代码的 output：

But this is what I want (including the population):但这就是我想要的（包括人口）：

Answer 1

Using zip , we can create a dictionary for each country and yield from there.使用zip ，我们可以为每个国家/地区创建一个字典并从那里产生。

for country, gdp, pop in zip(countries, GDP, population):
    yield {"country_name": country, "GDP": gdp, "population": pop}

The reason why your code doesn't work is that the generator is just going to yield a single huge dictionary, where each value is the entire list countries , GDP , and population , respectively.您的代码不起作用的原因是生成器只会生成一个巨大的字典，其中每个值分别是整个列表countries 、 GDP和population 。 To remedy this, you will want to create a dictionary for each country and yield each element per next call as shown above.为了解决这个问题，您需要为每个国家/地区创建一个字典，并next调用时产生每个元素，如上所示。

To test the generator, try要测试生成器，请尝试

gen = parse(response) # or self.parse(response) depending on context
print(next(gen))
print(next(gen))

Each time next is called, the generator will yield a different dictionary corresponding to a new country.每次调用next时，生成器都会生成对应于新国家的不同字典。

抓取时在单个字典中分组相同的数据

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-04-24 05:36:11

抓取时在单个字典中分组相同的数据

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-04-24 05:36:11

解决方案1
1 已采纳 2020-04-24 05:36:11