[英]Grouping same data in a single dictionary while scraping
I am trying scrape country name, GDP and population from this website .我正在尝试从这个网站上抓取国家名称、GDP 和人口。 I am using
Scrapy
with Python 3.7
.我正在使用
Scrapy
和Python 3.7
。 The problem is I am getting all the country data in a dictionary, all the GDP data in a dictionary and all the population data in a dictionary.问题是我正在字典中获取所有国家数据,字典中的所有 GDP 数据以及字典中的所有人口数据。 But I want corresponding country data, GDP and population in a dictionary.
但我想要字典中对应的国家数据、GDP 和人口。
Here is my code:这是我的代码:
import scrapy
class DebtByCountriesSpider(scrapy.Spider):
name = 'debt_by_countries'
allowed_domains = ['worldpopulationreview.com/countries/countries-by-national-debt']
start_urls = ['https://worldpopulationreview.com/countries/countries-by-national-debt/']
def parse(self, response):
# countries = response.xpath("//td/a/text()").getall()
countries = response.xpath("//tbody/tr/td/a/text()").getall()
GDP = response.xpath("//tbody/tr/td[2]/text()").getall()
population = response.xpath("//tbody/tr/td[3]/text()").getall()
yield{
"country_name": countries,
"GDP": GDP,
"population": population
}
Here is the output of my code:这是我的代码的 output:
But this is what I want (including the population):但这就是我想要的(包括人口):
Using zip
, we can create a dictionary for each country and yield from there.使用
zip
,我们可以为每个国家/地区创建一个字典并从那里产生。
for country, gdp, pop in zip(countries, GDP, population):
yield {"country_name": country, "GDP": gdp, "population": pop}
The reason why your code doesn't work is that the generator is just going to yield a single huge dictionary, where each value is the entire list countries
, GDP
, and population
, respectively.您的代码不起作用的原因是生成器只会生成一个巨大的字典,其中每个值分别是整个列表
countries
、 GDP
和population
。 To remedy this, you will want to create a dictionary for each country and yield each element per next
call as shown above.为了解决这个问题,您需要为每个国家/地区创建一个字典,并
next
调用时产生每个元素,如上所示。
To test the generator, try要测试生成器,请尝试
gen = parse(response) # or self.parse(response) depending on context
print(next(gen))
print(next(gen))
Each time next
is called, the generator will yield a different dictionary corresponding to a new country.每次调用
next
时,生成器都会生成对应于新国家的不同字典。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.