简体   繁体   English

熊猫集团和计数

[英]Pandas Group By and Count

A pandas dataframe df has 3 columns: 一个pandas数据帧df有3列:

user_id, session, revenue user_id,session,revenue

What I want to do now is group df by unique user_id and derive 2 new columns - one called number_sessions (counts the number of sessions associated with a particular user_id) and another called number_transactions (counts the number of rows under the revenue column that has a value > 0 for each user_id). 我现在要做的是通过唯一的user_id分组df并派生2个新列 - 一个名为number_sessions(计算与特定user_id关联的会话数),另一个名为number_transactions(计算收入列下具有的列数)每个user_id的值> 0)。 How do I go about doing this? 我该怎么做呢?

I tried doing something like this: 我尝试过这样的事情:

df.groupby('user_id')['session', 'revenue'].agg({'number sessions': lambda x: len(x.session), 
'number_transactions': lambda x: len(x[x.revenue>0])})

I think you can use: 我想你可以用:

df = pd.DataFrame({'user_id':['a','a','s','s','s'],
                   'session':[4,5,4,5,5],
                   'revenue':[-1,0,1,2,1]})

print (df)
   revenue  session user_id
0       -1        4       a
1        0        5       a
2        1        4       s
3        2        5       s
4        1        5       s

a = df.groupby('user_id') \
      .agg({'session': len, 'revenue': lambda x: len(x[x>0])}) \
      .rename(columns={'session':'number sessions','revenue':'number_transactions'})
print (a)
         number sessions  number_transactions
user_id                                      
a                      2                    0
s                      3                    3

a = df.groupby('user_id') \
      .agg({'session':{'number sessions': len}, 
            'revenue':{'number_transactions': lambda x: len(x[x>0])}}) 
a.columns = a.columns.droplevel()

print (a)
         number sessions  number_transactions
user_id                                      
a                      2                    0
s                      3                    3

I'd use nunique for session to not double count the same session for a particular user 我会使用nunique进行session ,而不是为特定用户重复计算同一会话

funcs = dict(session={'number sesssions': 'nunique'},
             revenue={'number transactions': lambda x: x.gt(0).sum()})
df.groupby('user_id').agg(funcs)

在此输入图像描述

setup 建立

df = pd.DataFrame({'user_id':['a','a','s','s','s'],
                   'session':[4,5,4,5,5],
                   'revenue':[-1,0,1,2,1]})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM