简体   繁体   English

大熊猫使用 .where() 替换 .groupby() 对象

[英]pandas using .where() to replace in .groupby() object

Consider a dataframe which contains several groups of integers:考虑一个包含几组整数的数据帧:

d = pd.DataFrame({'label': ['a','a','a','a','b','b','b','b'], 'value': [1,2,3,2,7,1,8,9]})
d
    label   value
0   a   1
1   a   2
2   a   3
3   a   2
4   b   7
5   b   1
6   b   8
7   b   9

For each of these groups of integers, each integer has to be bigger or equal to the previous one.对于这些整数组中的每一组,每个整数都必须大于或等于前一个。 If not the case, it takes on the value of the previous integer.如果不是这种情况,则采用前一个整数的值。 I replace using我替换使用

s.where(~(s < s.shift()), s.shift())

which works fine for a single series.这适用于单个系列。 I can even group the dataframe, and loop through each extracted series:我什至可以对数据框进行分组,并遍历每个提取的系列:

grouped = s.groupby('label')['value']
for _, s in grouped:
    print(s.where(~(s < s.shift()), s.shift()))
0    1.0
1    2.0
2    3.0
3    3.0
Name: value, dtype: float64
4    7.0
5    7.0
6    8.0
7    9.0
Name: value, dtype: float64

However, how do I now get these values back into my original dataframe?但是,我现在如何将这些值恢复到我的原始数据框中?

Or, is there a better way to do this?或者,有没有更好的方法来做到这一点? I don't care for using .groupby and don't consider the for loop a pretty solution either...我不在乎使用.groupby也不认为 for 循环是一个很好的解决方案......

IIUC, you can use cummax in the groupby like: IIUC,您可以在groupby使用cummax ,例如:

d['val_max'] = d.groupby('label')['value'].cummax()
print (d)
  label  value  val_max
0     a      1        1
1     a      2        2
2     a      3        3
3     a      2        3
4     b      7        7
5     b      1        7
6     b      8        8
7     b      9        9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM