计算 pandas dataframe 中每一列的第一个值的增长率并返回 Numpy 数组

Question

need a bit of help with my function.我的 function 需要一些帮助。

This is what I'm trying to do:这就是我想要做的：

Build a predictive model that can give us the best guess at what the population growth rate in a given year might be.构建一个预测 model 可以让我们最好地猜测给定年份的人口增长率可能是多少。 We will calculate the population growth rate as follows:我们将计算人口增长率如下：

As such, we can only calculate the growth rate for the year 1961 onwards.因此，我们只能计算 1961 年以后的增长率。

Write a function that takes the population_df and a country_code as input and computes the population growth rate for a given country starting from the year 1961. This function must return a return a 2-d numpy array that contains the year and corresponding growth rate for the country. Write a function that takes the population_df and a country_code as input and computes the population growth rate for a given country starting from the year 1961. This function must return a return a 2-d numpy array that contains the year and corresponding growth rate for the国家。

Function Specifications: Function 规格：

Should take a population_df and country_code string as input and return a numpy array as output.应该将 population_df 和 country_code 字符串作为输入并返回 numpy 数组作为 output。 The array should only have two columns containing the year and the population growth rate, in other words, it should have a shape (?, 2) where?该数组应该只有两列包含年份和人口增长率，换句话说，它应该有一个形状（？，2）在哪里？ is the length of the data.是数据的长度。

ℎ_ = __ − __ / __ ℎ_ = __ - __ / __

Should take a population_df and country_code string as input and return a numpy array as output.应该将 population_df 和 country_code 字符串作为输入并返回 numpy 数组作为 output。 The array should only have two columns containing the year and the population growth rate, in other words, it should have a shape (?, 2) where?该数组应该只有两列包含年份和人口增长率，换句话说，它应该有一个形状（？，2）在哪里？ is the length of the data.是数据的长度。

Input DF head:输入DF头：

My code:(Changeable)我的代码：（可更改）

def pop_growth_by_country_year(df,country_code):
    country_data = df.loc[country_code]
    for columnName, columnData in country_data.iteritems():
        country_data = ((country_data[columnData] - country_data[columnData-1]) / country_data[columnData-1])
    output = country_data.reset_index().to_numpy().reshape(-1, 2)
    return output

Input function(Not changeable)输入功能（不可更改）

pop_growth_by_country_year(population_df,'ABW')

Expected output:预期 output：

array([[ 1.961e+03,  2.263e-02],
       [ 1.962e+03,  1.420e-02],
       [ 1.963e+03,  8.360e-03],
       [ 1.964e+03,  5.940e-03],
            ...       ....
       [ 2.015e+03,  5.260e-03],
       [ 2.016e+03,  4.610e-03],
       [ 2.017e+03,  4.220e-03]])

Answer 1

My input:我的输入：

population_df = pd.DataFrame({
    '1960': {'ABW': 54211.0, 'AFG': 8996351.0, 'AGO': 5643182.0, 'ALB': 1608800.0, 'AND': 13411.0},
    '1961': {'ABW': 55438.0, 'AFG': 9166764.0, 'AGO': 5753024.0, 'ALB': 1659800.0, 'AND': 14375.0},
    '1962': {'ABW': 56225.0, 'AFG': 9345868.0, 'AGO': 5866061.0, 'ALB': 1711319.0, 'AND': 15370.0},
    '1963': {'ABW': 56695.0, 'AFG': 9533954.0, 'AGO': 5980417.0, 'ALB': 1762621.0, 'AND': 16412.0}
})
population_df

My solution:我的解决方案：

def pop_growth_by_country_year(df,country_code):
    current_population = df.loc[country_code]
    previous_population = current_population.shift(1)
    growth = (current_population-previous_population)/previous_population
    return growth.dropna().reset_index().astype(float).values

Output of pop_growth_by_country_year(population_df,'ABW') Output of pop_growth_by_country_year(population_df,'ABW')

array([[1.96100000e+03, 2.26337828e-02],
       [1.96200000e+03, 1.41960388e-02],
       [1.96300000e+03, 8.35927079e-03]])

Note that, since you don't have the previous population for the first year (1960 in this case), you will miss the growth for that year and for this reason len(output)=len(input)-1请注意，由于您没有第一年的先前人口（在这种情况下为 1960 年），您将错过那一年的增长，因此len(output)=len(input)-1

计算 pandas dataframe 中每一列的第一个值的增长率并返回 Numpy 数组

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-08-05 09:08:06

计算 pandas dataframe 中每一列的第一个值的增长率并返回 Numpy 数组

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-08-05 09:08:06

解决方案1
1 已采纳 2022-08-05 09:08:06