Pandas 系列 - groupby 并取最近的非空累积

Question

I have a dataframe with a Category column (which we will group by) and a Value column.我有一个 dataframe，其中包含Category列（我们将按其分组）和Value列。 I want to add a new column LastCleanValue which shows the most recent non null value for this group.我想添加一个新列LastCleanValue ，它显示该组的最新非 null 值。 If there have not been any non-nulls yet in the group, we just take null. For example:如果组中还没有任何非空值，我们就取 null。例如：

df = pd.DataFrame({'Category':['a','a','a','b','b','a','a','b','a','a','b'],
                   'Value':[np.nan, np.nan, 34, 40, 42, 25, np.nan, np.nan, 31, 33, np.nan]})

And the function should add a new column:而 function 应该添加一个新列：

|    | Category   |   Value |   LastCleanValue |
|---:|:-----------|--------:|-----------------:|
|  0 | a          |     nan |              nan |
|  1 | a          |     nan |              nan |
|  2 | a          |      34 |               34 |
|  3 | b          |      40 |               40 |
|  4 | b          |      42 |               42 |
|  5 | a          |      25 |               25 |
|  6 | a          |     nan |               25 |
|  7 | b          |     nan |               42 |
|  8 | a          |      31 |               31 |
|  9 | a          |      33 |               33 |
| 10 | b          |     nan |               42 |

How can I do this in Pandas?我如何在 Pandas 中执行此操作？ I was attempting something like df.groupby('Category')['Value'].dropna().last()我正在尝试类似df.groupby('Category')['Value'].dropna().last()

Answer 1

This is more like ffill这更像是ffill

df['new'] = df.groupby('Category')['Value'].ffill()
Out[430]: 
0      NaN
1      NaN
2     34.0
3     40.0
4     42.0
5     25.0
6     25.0
7     42.0
8     31.0
9     33.0
10    42.0
Name: Value, dtype: float64

Pandas 系列 - groupby 并取最近的非空累积

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-12-06 23:50:47

Pandas 系列 - groupby 并取最近的非空累积

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-12-06 23:50:47

解决方案1
1 已采纳 2020-12-06 23:50:47