简体   繁体   English

抓取时在单个字典中分组相同的数据

[英]Grouping same data in a single dictionary while scraping

I am trying scrape country name, GDP and population from this website .我正在尝试从这个网站上抓取国家名称、GDP 和人口。 I am using Scrapy with Python 3.7 .我正在使用ScrapyPython 3.7 The problem is I am getting all the country data in a dictionary, all the GDP data in a dictionary and all the population data in a dictionary.问题是我正在字典中获取所有国家数据,字典中的所有 GDP 数据以及字典中的所有人口数据。 But I want corresponding country data, GDP and population in a dictionary.但我想要字典中对应的国家数据、GDP 和人口。

Here is my code:这是我的代码:

import scrapy

class DebtByCountriesSpider(scrapy.Spider):
    name = 'debt_by_countries'
    allowed_domains = ['worldpopulationreview.com/countries/countries-by-national-debt']
    start_urls = ['https://worldpopulationreview.com/countries/countries-by-national-debt/']

    def parse(self, response):

        # countries = response.xpath("//td/a/text()").getall()

        countries = response.xpath("//tbody/tr/td/a/text()").getall()
        GDP = response.xpath("//tbody/tr/td[2]/text()").getall()
        population = response.xpath("//tbody/tr/td[3]/text()").getall()


        yield{
            "country_name": countries,
            "GDP": GDP,
            "population": population

        }

Here is the output of my code:这是我的代码的 output:

输出 1

But this is what I want (including the population):但这就是我想要的(包括人口):

我想要的是

Using zip , we can create a dictionary for each country and yield from there.使用zip ,我们可以为每个国家/地区创建一个字典并从那里产生。

for country, gdp, pop in zip(countries, GDP, population):
    yield {"country_name": country, "GDP": gdp, "population": pop}

The reason why your code doesn't work is that the generator is just going to yield a single huge dictionary, where each value is the entire list countries , GDP , and population , respectively.您的代码不起作用的原因是生成器只会生成一个巨大的字典,其中每个值分别是整个列表countriesGDPpopulation To remedy this, you will want to create a dictionary for each country and yield each element per next call as shown above.为了解决这个问题,您需要为每个国家/地区创建一个字典,并next调用时产生每个元素,如上所示。

To test the generator, try要测试生成器,请尝试

gen = parse(response) # or self.parse(response) depending on context
print(next(gen))
print(next(gen))

Each time next is called, the generator will yield a different dictionary corresponding to a new country.每次调用next时,生成器都会生成对应于新国家的不同字典。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM