简体   繁体   English

Pandas 系列 - groupby 并取最近的非空累积

[英]Pandas Series - groupby and take cumulative most recent non-null

I have a dataframe with a Category column (which we will group by) and a Value column.我有一个 dataframe,其中包含Category列(我们将按其分组)和Value列。 I want to add a new column LastCleanValue which shows the most recent non null value for this group.我想添加一个新列LastCleanValue ,它显示该组的最新非 null 值。 If there have not been any non-nulls yet in the group, we just take null. For example:如果组中还没有任何非空值,我们就取 null。例如:

df = pd.DataFrame({'Category':['a','a','a','b','b','a','a','b','a','a','b'],
                   'Value':[np.nan, np.nan, 34, 40, 42, 25, np.nan, np.nan, 31, 33, np.nan]})

And the function should add a new column:而 function 应该添加一个新列:

|    | Category   |   Value |   LastCleanValue |
|---:|:-----------|--------:|-----------------:|
|  0 | a          |     nan |              nan |
|  1 | a          |     nan |              nan |
|  2 | a          |      34 |               34 |
|  3 | b          |      40 |               40 |
|  4 | b          |      42 |               42 |
|  5 | a          |      25 |               25 |
|  6 | a          |     nan |               25 |
|  7 | b          |     nan |               42 |
|  8 | a          |      31 |               31 |
|  9 | a          |      33 |               33 |
| 10 | b          |     nan |               42 |

How can I do this in Pandas?我如何在 Pandas 中执行此操作? I was attempting something like df.groupby('Category')['Value'].dropna().last()我正在尝试类似df.groupby('Category')['Value'].dropna().last()

This is more like ffill这更像是ffill

df['new'] = df.groupby('Category')['Value'].ffill()
Out[430]: 
0      NaN
1      NaN
2     34.0
3     40.0
4     42.0
5     25.0
6     25.0
7     42.0
8     31.0
9     33.0
10    42.0
Name: Value, dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM