简体   繁体   English

Pandas:将 <= 0 的所有值按组设置为列中的最大值,但仅在该组中的最后一个正值之后

[英]Pandas: set all values that are <= 0 to the maximum value in a column by group, but only after the last positive value in that group

I am trying to set all values that are <= 0, by group, to the maximum value in that group, but only after the last positive value.我试图将所有 <= 0 的值按组设置为该组中的最大值,但仅在最后一个正值之后。 That is, all values <=0 in the group that come before the last positive value must be ignored.也就是说,必须忽略组中最后一个正值之前的所有 <=0 值。 Example:例子:

data = {'group':['A', 'A', 'A', 'A', 'A', 'B', 'B', 
                'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C'], 
                 'value':[3, 0, 8, 7, 0, -1, 0, 9, -2, 0, 0, 2, 0, 5, 0, 1]} 
df = pd.DataFrame(data)
df

  group  value
0   A      3
1   A      0
2   A      8
3   A      7
4   A      0
5   B     -1
6   B      0
7   B      9
8   B     -2
9   B      0
10  B      0
11  C      2
12  C      0
13  C      5
14  C      0
15  C      1

and the result must be:结果必须是:

  group  value
0   A      3
1   A      0
2   A      8
3   A      7
4   A      8
5   B     -1
6   B      0
7   B      9
8   B      9
9   B      9
10  B      9
11  C      2
12  C      0
13  C      5
14  C      0
15  C      1

Thanks to advise谢谢指教

Start by adding a column to identify the rows with negative value (more precisely <= 0):首先添加一列来标识具有负值的行(更准确地说 <= 0):

df['neg'] = (df['value'] <= 0)

Then, for each group, find the sequence of last few entries that have 'neg' set to True and that are contiguous.然后,对于每个组,找到将'neg'设置为 True 且连续的最后几个条目的序列。 In order to do that, reverse the order of the DataFrame (with .iloc[::-1] ) and then use .cumprod() on the 'neg' column.为此,请颠倒 DataFrame 的顺序(使用.iloc[::-1] ),然后在'neg'列上使用.cumprod() cumprod() will treat True as 1 and False as 0, so the cumulative product will be 1 as long as you're seeing all True's and will become and stay 0 as soon as you see the first False. cumprod()会将 True 视为 1,将 False 视为 0,因此只要您看到所有 True,累积乘积将为 1,并且在您看到第一个 False 时将变为并保持为 0。 Since we reversed the order, we're going backwards from the end, so we're finding the sequence of True's at the end.因为我们颠倒了顺序,所以我们从末尾向后,所以我们在末尾找到 True 的序列。

df['upd'] = df.iloc[::-1].groupby('group')['neg'].cumprod().astype(bool)

Now that we know which entries to update, we just need to know what to update them to, which is the max of the group.现在我们知道要更新哪些条目,我们只需要知道将它们更新为什么,即组的最大值。 We can use transform('max') on a groupby to get that value and then all that's left is to do the actual update of 'value' where 'upd' is set:我们可以在 groupby 上使用transform('max')来获取该值,然后剩下的就是对设置了'upd''value'进行实际更新:

df.loc[df['upd'], 'value'] = df.groupby('group')['value'].transform('max')

We can finish by dropping the two auxiliary columns we used in the process:我们可以通过删除我们在过程中使用的两个辅助列来完成:

df = df.drop(['neg', 'upd'], axis=1)

The result I got matches your expected result.我得到的结果与您的预期结果相符。


UPDATE: Or do the whole operation in a single (long!) line, without adding any auxiliary columns to the original DataFrame:更新:或者在单个(长!)行中执行整个操作,而不向原始 DataFrame 添加任何辅助列:

df.loc[
    df.assign(
        neg=(df['value'] <= 0)
    ).iloc[::-1].groupby(
        'group'
    )['neg'].cumprod().astype(bool),
    'value'
] = df.groupby(
    'group'
)['value'].transform('max')

You can do it this way.你可以这样做。

(df.loc[(df.assign(m=df['value'].lt(0)).groupby(['group'], sort=False)['m'].transform('any')) &
    (df.index>=df.groupby('group')['value'].transform('idxmin')),'value']) = np.nan
df['value']=df.groupby('group').ffill()
df

Output输出

group   value
0   A   3.0
1   A   0.0
2   A   8.0
3   A   7.0
4   A   0.0
5   B   -1.0
6   B   0.0
7   B   9.0
8   B   9.0
9   B   9.0
10  B   9.0
11  C   2.0
12  C   0.0
13  C   5.0
14  C   0.0
15  C   1.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:根据列中的当前值设置先前值(按组) - Pandas: set preceding values conditional on current value in column (by group) pandas 根据列中的重复值对 dataframe 中的行进行分组,并在 Uniqe ID 值之后重复所有行 - The pandas group the rows in the dataframe based on the repeated values in the column and repeat all after an Uniqe ID value pandas:如果组的最后一行具有特定的列值,如何删除组的所有行 - pandas: how to drop all rows of a group if the last row of the group has certain column value 如何在没有按列分组的情况下获取 pandas 中每组的第一个和最后一个值? - How to get first and last value of each group in pandas with no group by column? Pandas 计算一组中的所有值与上一组的最后一个值之间的差异 - Pandas calculate the diff between all values in one group and the last value of the previous group Python Pandas:将 DataFrame 组的最后一个值分配给该组的所有条目 - Python Pandas: Assign Last Value of DataFrame Group to All Entries of That Group 将列值设置为 Pandas 中组的平均值 - Set column value as the mean of a group in pandas 按列值分组并将其设置为Pandas中的索引 - Group by column value and set it as index in Pandas 用最后一组的最后一个值填充组列 - Fill group column with last value of the last group 使用 pandas 根据条件分组中的上个月最后一个值,将所有空值替换为最后一行 - Using pandas replace all empty values with last row based on previous month last value in a group by condition
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM