简体   繁体   English

熊猫填充/填充以进行特定的观察

[英]pandas ffill/bfill for specific amount of observation

I have the following dataframe: 我有以下数据框:

 id     indicator 
 1          NaN
 1          NaN
 1          1
 1          NaN
 1          NaN
 1          NaN

In reality, I have several more ids. 实际上,我还有几个ID。 My question now is, how do I do a forward or backward fill for a specific range, eg for only the next/last 2 observations. 我现在的问题是,如何对特定范围(例如仅下一个/最后两个观测值)进行正向或反向填充。 My dataframe should look like this: 我的数据框应如下所示:

 id     indicator 
 1          NaN
 1          NaN 
 1          1
 1          1
 1          1
 1          NaN

I know the command 我知道命令

df.groupby("id")["indicator"].fillna(value=None, method="ffill")         

However, this fills all the missing values instead of just the next two observations. 但是,这将填充所有缺失值,而不是仅填充接下来的两个观察值。 Anyone knows a solution? 有人知道解决方案吗?

I think DataFrameGroupBy.ffill or DataFrameGroupBy.bfill with limit parameter is nicer: 我认为带有limit参数的DataFrameGroupBy.ffillDataFrameGroupBy.bfill更好:

df.groupby("id")["indicator"].ffill(limit=3)

df.groupby("id")["indicator"].bfill(limit=3)

Sample: 样品:

#5 value is in the end of group, so only one value is filled 
df['filled'] = df.groupby("id")["indicator"].ffill(limit=2)
print (df)
    id  indicator  filled
0    1        NaN     NaN
1    1        NaN     NaN
2    1        1.0     1.0
3    1        NaN     1.0
4    1        NaN     1.0
5    1        NaN     NaN
6    1        NaN     NaN
7    1        NaN     NaN
8    1        4.0     4.0
9    1        NaN     4.0
10   1        NaN     4.0
11   1        NaN     NaN
12   1        NaN     NaN
13   2        NaN     NaN
14   2        NaN     NaN
15   2        1.0     1.0
16   2        NaN     1.0
17   2        NaN     1.0
18   2        NaN     NaN
19   2        5.0     5.0
20   2        NaN     5.0
21   3        3.0     3.0
22   3        NaN     3.0
23   3        NaN     3.0
24   3        NaN     NaN
25   3        NaN     NaN

almost there, straight from the doc 几乎在那里,直接来自文档

If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. 如果指定了method,则这是要向前/向后填充的连续NaN值的最大数量。 In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. 换句话说,如果存在连续的NaN数量大于此数量的缺口,它将仅被部分填充。 If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. 如果未指定method,则这是将填写NaN的整个轴上的最大条目数。 Must be greater than 0 if not None. 如果不为None,则必须大于0。

df.groupby("id")["indicator"].fillna(value=None,method="ffill",limit=3) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM