df = pd.DataFrame([[11,'b',10,'2020-01-05'],
[11,'c',4,'2020-01-02'],
[11,'a',6,'2020-01-01'],
[22,'c',2,'2020-01-13'],
[22,'a',8,'2020-01-05'],
[33,'b',2,'2020-01-09'],
[33,'d',6,'2020-01-05'],
[33,'a',8,'2020-01-01']], columns=['user','lecture','not','date'])
The output will then be:
userid lecture note date
0 11 b 10 2020-01-05
1 11 c 4 2020-01-02
2 11 a 6 2020-01-01
3 22 c 2 2020-01-13
4 22 a 8 2020-01-05
5 33 b 2 2020-01-09
6 33 d 6 2020-01-05
7 33 a 8 2020-01-01
I want to get the average not each user. but it should be the total previous date's average the result should be like this;
userid lecture note date avg
0 11 b 10 2020-01-05 6.666667 ((10+4+6)/3)
1 11 c 4 2020-01-02 5 ((4+6)/2)
2 11 a 6 2020-01-01 6
3 22 c 2 2020-01-13 5 ((2+8)/2)
4 22 a 8 2020-01-05 8
5 33 b 2 2020-01-09 5.33334 ((2+6+8)/3)
6 33 d 6 2020-01-05 7 ((6+8)/2)
7 33 a 8 2020-01-01 8
I'm trying some lambda codes. but I couldn't reach the result
grouped = df.sort_values(['user'], ascending=False).groupby('user',as_index = False).apply(lambda x: x.reset_index(drop = True))
grouped['count'] = grouped.groupby('user').note.transform(lambda x:((x.count()-1)))
grouped['mean'] = grouped.groupby('user').note.transform(lambda x:(x.shift(1).sum()/len(x)))
Try a reversed expanding mean
:
df['avg'] = (
df.groupby('user')['not']
.apply(lambda g: g[::-1].expanding().mean())
.droplevel(0)
)
Or
df['avg'] = (
df.loc[::-1, 'not'].groupby(df['user']).expanding().mean().droplevel(0)
)
df
:
user lecture not date avg
0 11 b 10 2020-01-05 6.666667
1 11 c 4 2020-01-02 5.000000
2 11 a 6 2020-01-01 6.000000
3 22 c 2 2020-01-13 5.000000
4 22 a 8 2020-01-05 8.000000
5 33 b 2 2020-01-09 5.333333
6 33 d 6 2020-01-05 7.000000
7 33 a 8 2020-01-01 8.000000
I have used a for-loop
to accomplish the requirement. The use of df.loc[row, col]
will specify each cell according to it's row and column location to do filtering and manipulation.
df['avg'] = '' #initialize an empty column
for i in range(len(df)):
temp = df.loc[i:, 'not'][df.loc[i:, 'user']==df.loc[i, 'user']]
df.loc[i, 'avg'] = sum(temp)/len(temp)
Output df
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.