简体   繁体   English

如何将年度数据插入具有不同聚合级别的 python 中的每月频率?

[英]How to interpolate yearly data to a monthly frequency in python which has different aggregate levels?

I have mid-year estimated population data as illustrated below:我有如下所示的年中估计人口数据: 年中估计

The dataframe can be created using the code below可以使用以下代码创建 dataframe

import pandas as pd
df = pd.DataFrame({'Name':['WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
],
'Sex':['Male',
'Male',
'Female',
'Female',
'Male',
'Male',
'Female',
'Female',
'Male',
'Male',
'Female',
'Female',
'Male',
'Male',
'Female',
'Female',
'Male',
'Male',
'Female',
'Female',
],
'Age':['0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
],
'period':['2019/07/01',
'2019/07/01',
'2019/07/01',
'2019/07/01',
'2020/07/01',
'2020/07/01',
'2020/07/01',
'2020/07/01',
'2021/07/01',
'2021/07/01',
'2021/07/01',
'2021/07/01',
'2022/07/01',
'2022/07/01',
'2022/07/01',
'2022/07/01',
'2023/07/01',
'2023/07/01',
'2023/07/01',
'2023/07/01'],
'population':[21147.33972,
20435.77552,
20815.83029,
19908.72547,
21176.41455,
20678.62621,
20818.15366,
20166.97611,
21176.65456,
20819.50598,
20771.53888,
20316.90311,
21119.48584,
21024.48028,
20678.93492,
20525.76344,
21003.39475,
21219.41025,
20554.78559,
20706.95183,
]})

I want to convert it from yearly to monthly for each Name, Sex, Age (ie groupby) in a linear manner (equal proportion): ie diff = (future mid-year estimate - current mid-year estimate)/12 then add the diff to current mid-year estimate.我想以线性方式(等比例)将每个Name, Sex, Age (即 groupby)从每年转换为每月:即diff =(未来年中估计 - 当前年中估计)/12然后添加与当前年中估计的差异。

I have done this in excel and the results is:我在 excel 做了这个,结果是:

在此处输入图像描述

I have seen it done in R and seen examples using the .interpolate() function but it does not consider this for data with multiple levels.我已经看到它在R中完成并看到了使用.interpolate() function 的示例,但它没有考虑到具有多个级别的数据。 What would be the best way to do it?最好的方法是什么?

Here is one way to do it with Pandas groupby , rolling , resample and interpolate :这是使用 Pandas groupbyrollingresampleinterpolate来实现的一种方法:

# Setup
df["period"] = pd.to_datetime(df["period"])

# Get sub-dataframes, reshape and interpolate
dfs = []
for _, df in df.groupby(["Name", "Sex", "Age"]):
    for df_ in df.rolling(2):
        if df_.shape[0] == 1:
            continue
        df_ = df_.set_index("period").resample("M").agg(list).applymap(lambda x: x[0] if x else pd.NA)
        df_[["Name", "Sex", "Age"]] = df_[["Name", "Sex", "Age"]].fillna(method="ffill")
        df_["population"] = pd.to_numeric(df_["population"]).interpolate(method="linear")
        dfs.append(df_)

# Concatenate sub-dataframes back into one
new_df = pd.concat(dfs).drop_duplicates().reset_index().reindex(["Name", "Sex", "Age", "period", "population"], axis=1)

Then print(new_df) outputs:然后print(new_df)输出:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM