[英]How to interpolate yearly data to a monthly frequency in python which has different aggregate levels?
可以使用以下代碼創建 dataframe
import pandas as pd
df = pd.DataFrame({'Name':['WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
],
'Sex':['Male',
'Male',
'Female',
'Female',
'Male',
'Male',
'Female',
'Female',
'Male',
'Male',
'Female',
'Female',
'Male',
'Male',
'Female',
'Female',
'Male',
'Male',
'Female',
'Female',
],
'Age':['0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
],
'period':['2019/07/01',
'2019/07/01',
'2019/07/01',
'2019/07/01',
'2020/07/01',
'2020/07/01',
'2020/07/01',
'2020/07/01',
'2021/07/01',
'2021/07/01',
'2021/07/01',
'2021/07/01',
'2022/07/01',
'2022/07/01',
'2022/07/01',
'2022/07/01',
'2023/07/01',
'2023/07/01',
'2023/07/01',
'2023/07/01'],
'population':[21147.33972,
20435.77552,
20815.83029,
19908.72547,
21176.41455,
20678.62621,
20818.15366,
20166.97611,
21176.65456,
20819.50598,
20771.53888,
20316.90311,
21119.48584,
21024.48028,
20678.93492,
20525.76344,
21003.39475,
21219.41025,
20554.78559,
20706.95183,
]})
我想以線性方式(等比例)將每個Name, Sex, Age
(即 groupby)從每年轉換為每月:即diff =(未來年中估計 - 當前年中估計)/12然后添加與當前年中估計的差異。
我在 excel 做了這個,結果是:
我已經看到它在R中完成並看到了使用.interpolate()
function 的示例,但它沒有考慮到具有多個級別的數據。 最好的方法是什么?
這是使用 Pandas groupby 、 rolling 、 resample和interpolate來實現的一種方法:
# Setup
df["period"] = pd.to_datetime(df["period"])
# Get sub-dataframes, reshape and interpolate
dfs = []
for _, df in df.groupby(["Name", "Sex", "Age"]):
for df_ in df.rolling(2):
if df_.shape[0] == 1:
continue
df_ = df_.set_index("period").resample("M").agg(list).applymap(lambda x: x[0] if x else pd.NA)
df_[["Name", "Sex", "Age"]] = df_[["Name", "Sex", "Age"]].fillna(method="ffill")
df_["population"] = pd.to_numeric(df_["population"]).interpolate(method="linear")
dfs.append(df_)
# Concatenate sub-dataframes back into one
new_df = pd.concat(dfs).drop_duplicates().reset_index().reindex(["Name", "Sex", "Age", "period", "population"], axis=1)
然后print(new_df)
輸出:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.