如何将年度数据插入具有不同聚合级别的 python 中的每月频率？

Question

I have mid-year estimated population data as illustrated below:我有如下所示的年中估计人口数据：

The dataframe can be created using the code below可以使用以下代码创建 dataframe

import pandas as pd
df = pd.DataFrame({'Name':['WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
],
'Sex':['Male',
'Male',
'Female',
'Female',
'Male',
'Male',
'Female',
'Female',
'Male',
'Male',
'Female',
'Female',
'Male',
'Male',
'Female',
'Female',
'Male',
'Male',
'Female',
'Female',
],
'Age':['0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
],
'period':['2019/07/01',
'2019/07/01',
'2019/07/01',
'2019/07/01',
'2020/07/01',
'2020/07/01',
'2020/07/01',
'2020/07/01',
'2021/07/01',
'2021/07/01',
'2021/07/01',
'2021/07/01',
'2022/07/01',
'2022/07/01',
'2022/07/01',
'2022/07/01',
'2023/07/01',
'2023/07/01',
'2023/07/01',
'2023/07/01'],
'population':[21147.33972,
20435.77552,
20815.83029,
19908.72547,
21176.41455,
20678.62621,
20818.15366,
20166.97611,
21176.65456,
20819.50598,
20771.53888,
20316.90311,
21119.48584,
21024.48028,
20678.93492,
20525.76344,
21003.39475,
21219.41025,
20554.78559,
20706.95183,
]})

I want to convert it from yearly to monthly for each Name, Sex, Age (ie groupby) in a linear manner (equal proportion): ie diff = (future mid-year estimate - current mid-year estimate)/12 then add the diff to current mid-year estimate.我想以线性方式（等比例）将每个Name, Sex, Age （即 groupby）从每年转换为每月：即diff =（未来年中估计 - 当前年中估计）/12然后添加与当前年中估计的差异。

I have done this in excel and the results is:我在 excel 做了这个，结果是：

I have seen it done in R and seen examples using the .interpolate() function but it does not consider this for data with multiple levels.我已经看到它在R中完成并看到了使用.interpolate() function 的示例，但它没有考虑到具有多个级别的数据。 What would be the best way to do it?最好的方法是什么？

Answer 1

Here is one way to do it with Pandas groupby , rolling , resample and interpolate :这是使用 Pandas groupby 、 rolling 、 resample和interpolate来实现的一种方法：

# Setup
df["period"] = pd.to_datetime(df["period"])

# Get sub-dataframes, reshape and interpolate
dfs = []
for _, df in df.groupby(["Name", "Sex", "Age"]):
    for df_ in df.rolling(2):
        if df_.shape[0] == 1:
            continue
        df_ = df_.set_index("period").resample("M").agg(list).applymap(lambda x: x[0] if x else pd.NA)
        df_[["Name", "Sex", "Age"]] = df_[["Name", "Sex", "Age"]].fillna(method="ffill")
        df_["population"] = pd.to_numeric(df_["population"]).interpolate(method="linear")
        dfs.append(df_)

# Concatenate sub-dataframes back into one
new_df = pd.concat(dfs).drop_duplicates().reset_index().reindex(["Name", "Sex", "Age", "period", "population"], axis=1)

Then print(new_df) outputs:然后print(new_df)输出：

如何将年度数据插入具有不同聚合级别的 python 中的每月频率？

问题描述

1 个解决方案

解决方案1
0 2022-12-05 09:54:51

如何将年度数据插入具有不同聚合级别的 python 中的每月频率？

问题描述

1 个解决方案

解决方案1 0 2022-12-05 09:54:51

解决方案1
0 2022-12-05 09:54:51