[英]Pandas Series - groupby and take cumulative most recent non-null
I have a dataframe with a Category
column (which we will group by) and a Value
column.我有一个 dataframe,其中包含
Category
列(我们将按其分组)和Value
列。 I want to add a new column LastCleanValue
which shows the most recent non null value for this group.我想添加一个新列
LastCleanValue
,它显示该组的最新非 null 值。 If there have not been any non-nulls yet in the group, we just take null. For example:如果组中还没有任何非空值,我们就取 null。例如:
df = pd.DataFrame({'Category':['a','a','a','b','b','a','a','b','a','a','b'],
'Value':[np.nan, np.nan, 34, 40, 42, 25, np.nan, np.nan, 31, 33, np.nan]})
And the function should add a new column:而 function 应该添加一个新列:
| | Category | Value | LastCleanValue |
|---:|:-----------|--------:|-----------------:|
| 0 | a | nan | nan |
| 1 | a | nan | nan |
| 2 | a | 34 | 34 |
| 3 | b | 40 | 40 |
| 4 | b | 42 | 42 |
| 5 | a | 25 | 25 |
| 6 | a | nan | 25 |
| 7 | b | nan | 42 |
| 8 | a | 31 | 31 |
| 9 | a | 33 | 33 |
| 10 | b | nan | 42 |
How can I do this in Pandas?我如何在 Pandas 中执行此操作? I was attempting something like
df.groupby('Category')['Value'].dropna().last()
我正在尝试类似
df.groupby('Category')['Value'].dropna().last()
This is more like ffill
这更像是
ffill
df['new'] = df.groupby('Category')['Value'].ffill()
Out[430]:
0 NaN
1 NaN
2 34.0
3 40.0
4 42.0
5 25.0
6 25.0
7 42.0
8 31.0
9 33.0
10 42.0
Name: Value, dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.