简体   繁体   English

大熊猫根据列分组的指标计算差异

[英]pandas calculate difference based on indicators grouped by a column

Here is my question. 这是我的问题。 I don't know how to describe it, so I will just give an example. 我不知道如何描述它,所以我只举一个例子。

a b k
0 0 0
0 1 1
0 2 0
0 3 0
0 4 1
0 5 0
1 0 0
1 1 1
1 2 0
1 3 1
1 4 0

Here, "a" is user id, "b" is time, and "k" is a binary indicator flag. 此处,“ a”是用户标识,“ b”是时间,“ k”是二进制指示符标志。 "b" is consecutive for sure. “ b”肯定是连续的。 What I want to get is this: 我想要得到的是:

a b k diff_b
0 0 0 nan
0 1 1 nan
0 2 0 1
0 3 0 2
0 4 1 3
0 5 0 1
1 0 0 nan
1 1 1 nan
1 2 0 1
1 3 1 2
1 4 0 1

So, diff_b is a time difference variable. 因此,diff_b是一个时差变量。 It shows the duration between the current time point and the last time point with an action. 它显示一个动作在当前时间点和最后一个时间点之间的持续时间。 If there is never an action before, it returns nan. 如果以前从未执行过操作,则返回nan。 This diff_b is grouped by a. 此diff_b按a分组。 For each user, this diff_b is calculated independently. 对于每个用户,此diff_b都是独立计算的。

Can anyone revise my title? 谁能修改我的头衔? I don't know how to describe it in english. 我不知道怎么用英语描述。 So complex... 好复杂

Thank you! 谢谢!

IIUC IIUC

df['New']=df.b.loc[df.k==1]# get all value b when k equal to 1
df.New=df.groupby('a').New.apply(lambda x : x.ffill().shift()) # fillna by froward method , then we need shift.
df.b-df['New']# yield 
Out[260]: 
0     NaN
1     NaN
2     1.0
3     2.0
4     3.0
5     1.0
6     NaN
7     NaN
8     1.0
9     2.0
10    1.0
dtype: float64

create partitions of the data of rows after k == 1 up to the next k == 1 using cumsum, and shift, for each group of a 创建后的行的数据的分区k == 1到下一个k == 1使用cumsum和移位,对于每个组的a

parts = df.groupby('a').k.apply(lambda x: x.shift().cumsum())

group by the df.a & parts and calculate the difference between b & b.min() within each group df.aparts分组,并计算b.min() bb.min()之间的差

vals = df.groupby([df.a, parts]).b.apply(lambda x: x-x.min()+1)

set values to null when part == 0 & assign back to the dataframe 当part == 0时将值设置为null并分配回数据框

df['diff_b'] = np.select([parts!=0], [vals], np.nan)

outputs: 输出:

    a  b  k  diff_b
0   0  0  0     NaN
1   0  1  1     NaN
2   0  2  0     1.0
3   0  3  0     2.0
4   0  4  1     3.0
5   0  5  0     1.0
6   1  0  0     NaN
7   1  1  1     NaN
8   1  2  0     1.0
9   1  3  1     2.0
10  1  4  0     1.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM