[英]Pandas: set all values that are <= 0 to the maximum value in a column by group, but only after the last positive value in that group
I am trying to set all values that are <= 0, by group, to the maximum value in that group, but only after the last positive value.我试图将所有 <= 0 的值按组设置为该组中的最大值,但仅在最后一个正值之后。 That is, all values <=0 in the group that come before the last positive value must be ignored.也就是说,必须忽略组中最后一个正值之前的所有 <=0 值。 Example:例子:
data = {'group':['A', 'A', 'A', 'A', 'A', 'B', 'B',
'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C'],
'value':[3, 0, 8, 7, 0, -1, 0, 9, -2, 0, 0, 2, 0, 5, 0, 1]}
df = pd.DataFrame(data)
df
group value
0 A 3
1 A 0
2 A 8
3 A 7
4 A 0
5 B -1
6 B 0
7 B 9
8 B -2
9 B 0
10 B 0
11 C 2
12 C 0
13 C 5
14 C 0
15 C 1
and the result must be:结果必须是:
group value
0 A 3
1 A 0
2 A 8
3 A 7
4 A 8
5 B -1
6 B 0
7 B 9
8 B 9
9 B 9
10 B 9
11 C 2
12 C 0
13 C 5
14 C 0
15 C 1
Thanks to advise谢谢指教
Start by adding a column to identify the rows with negative value (more precisely <= 0):首先添加一列来标识具有负值的行(更准确地说 <= 0):
df['neg'] = (df['value'] <= 0)
Then, for each group, find the sequence of last few entries that have 'neg'
set to True and that are contiguous.然后,对于每个组,找到将'neg'
设置为 True 且连续的最后几个条目的序列。 In order to do that, reverse the order of the DataFrame (with .iloc[::-1]
) and then use .cumprod()
on the 'neg'
column.为此,请颠倒 DataFrame 的顺序(使用.iloc[::-1]
),然后在'neg'
列上使用.cumprod()
。 cumprod()
will treat True as 1 and False as 0, so the cumulative product will be 1 as long as you're seeing all True's and will become and stay 0 as soon as you see the first False. cumprod()
会将 True 视为 1,将 False 视为 0,因此只要您看到所有 True,累积乘积将为 1,并且在您看到第一个 False 时将变为并保持为 0。 Since we reversed the order, we're going backwards from the end, so we're finding the sequence of True's at the end.因为我们颠倒了顺序,所以我们从末尾向后,所以我们在末尾找到 True 的序列。
df['upd'] = df.iloc[::-1].groupby('group')['neg'].cumprod().astype(bool)
Now that we know which entries to update, we just need to know what to update them to, which is the max of the group.现在我们知道要更新哪些条目,我们只需要知道将它们更新为什么,即组的最大值。 We can use transform('max')
on a groupby to get that value and then all that's left is to do the actual update of 'value'
where 'upd'
is set:我们可以在 groupby 上使用transform('max')
来获取该值,然后剩下的就是对设置了'upd'
的'value'
进行实际更新:
df.loc[df['upd'], 'value'] = df.groupby('group')['value'].transform('max')
We can finish by dropping the two auxiliary columns we used in the process:我们可以通过删除我们在过程中使用的两个辅助列来完成:
df = df.drop(['neg', 'upd'], axis=1)
The result I got matches your expected result.我得到的结果与您的预期结果相符。
UPDATE: Or do the whole operation in a single (long!) line, without adding any auxiliary columns to the original DataFrame:更新:或者在单个(长!)行中执行整个操作,而不向原始 DataFrame 添加任何辅助列:
df.loc[
df.assign(
neg=(df['value'] <= 0)
).iloc[::-1].groupby(
'group'
)['neg'].cumprod().astype(bool),
'value'
] = df.groupby(
'group'
)['value'].transform('max')
You can do it this way.你可以这样做。
(df.loc[(df.assign(m=df['value'].lt(0)).groupby(['group'], sort=False)['m'].transform('any')) &
(df.index>=df.groupby('group')['value'].transform('idxmin')),'value']) = np.nan
df['value']=df.groupby('group').ffill()
df
Output输出
group value
0 A 3.0
1 A 0.0
2 A 8.0
3 A 7.0
4 A 0.0
5 B -1.0
6 B 0.0
7 B 9.0
8 B 9.0
9 B 9.0
10 B 9.0
11 C 2.0
12 C 0.0
13 C 5.0
14 C 0.0
15 C 1.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.