[英]How to fill missing values using last available data from past months?
I have a dataframe like this: 我有一个这样的数据框:
Month/Year Value
01/2018 100
03/2018 200
06/2018 500
The values for 02/2018, 04/2018 and 05/2018 is missing because the value did not change in those months. 缺少02 / 2018、04 / 2018和05/2018的值,因为这些月份的值没有变化。 I would like to have a dataframe which incudes the missing months:
我想有一个数据框,它会导致缺少的月份:
Month/Year Value
01/2018 100
02/2018 100
03/2018 200
04/2018 200
04/2018 200
06/2018 500
Can anyone help? 有人可以帮忙吗?
One way to do this: 一种方法是:
df.assign(**{"Month/Year": pd.to_datetime(data["Month/Year"])}).set_index("Month/Year").resample("M").ffill().reset_index()
Should yield: 应该产生:
Month/Year Value
0 2018-01-31 100
1 2018-02-28 100
2 2018-03-31 200
3 2018-04-30 200
4 2018-05-31 200
5 2018-06-30 500
df
here is your starting dataframe. df
这是您的起始数据帧。 It gets resampled
to a monthly frequency and we use the .ffill
method to fill the values for the missing months. 它被
resampled
到每月一次的频率,我们使用.ffill
方法填充缺少月份的值。
I opted for a one-liner but you can break it down to a more structured block of code. 我选择了单行代码,但是您可以将其分解为更结构化的代码块。 You can also reformat the
Month/Year
column after the resampling. 重采样后,您还可以重新格式化“
Month/Year
列。
I hope this helps. 我希望这有帮助。
You can use pd.DataFrame.resample
, then pd.Series.ffill
to forward-fill null values. 您可以使用
pd.DataFrame.resample
,然后pd.Series.ffill
转发填充空值。 If you require string dates, an extra conversion is required, as below. 如果需要字符串日期,则需要进行额外的转换,如下所示。
df['date'] = pd.to_datetime(df['Month/Year'])
res = df.resample('M', on='date')\
.sum().ffill().astype(int)\
.reset_index()
res['date'] = res['date'].dt.strftime('%m/%Y')
print(res)
date Value
0 01/2018 100
1 02/2018 100
2 03/2018 200
3 04/2018 200
4 05/2018 200
5 06/2018 500
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.