簡體   English   中英

Pandas Group by 多列到一列

[英]Pandas Group by on multiple columns to one column

給定一個 df

session_id  article session_type
  1         a       req
  1         b       req
  1         null    action
  2         home    req
  2         h       req
  2         j       req
  2         home    req
  3         home    req
  3         home    req
  3         r       req
  3         home    req

我想匯總到一列作為字典: 1. 獨特的文章 2.unique session_type 3. 計算所有不連續的家

輸出:

sess_id agg_col
1      {unique_articles:2,unique_promotion_session:2,non_consectutive_home:0}
2      {unique_articles:2,unique_promotion_session:1,non_consectutive_home:2}
3      {unique_articles:1,unique_promotion_session:1,non_consectutive_home:1}

謝謝。

用:

#compare home to mask
m = df['article'].eq('home')
#create consecutive groups and filter only by mask home groups
s = m.ne(m.groupby(df['session_id']).shift()).cumsum()[m]
#counts number of groups, compare by 1 for unique home groups
df['home'] = s.map(s.value_counts()).eq(1).astype(int)
#repalce home and null to NaNs for omit this values
df['article']  = df['article'].mask(m | df['article'].eq('null'))
df['home'] = df['home'].fillna(0).astype(int)

#aggregtae number of unique values with omit NaNs and sum for count 1
df1 = df.groupby('session_id').agg({'article':'nunique',
                                    'session_type':'nunique',
                                    'home':'sum'})
df1 = df1.rename(columns={'article':'unique_articles',
                          'session_type':'unique_promotion_session',
                          'home':'non_consectutive_home'})
print (df1)
            unique_articles  unique_promotion_session  non_consectutive_home
session_id                                                                  
1                         2                         2                      0
2                         2                         1                      2
3                         1                         1                      1

#create DaatFrame filled by dicts
d = df1.to_dict('index')

df2 = pd.DataFrame({'sess_id': list(d.keys()),
                    'agg_col': list(d.values())})
print (df2)
   sess_id                                            agg_col
0        1  {'unique_articles': 2, 'unique_promotion_sessi...
1        2  {'unique_articles': 2, 'unique_promotion_sessi...
2        3  {'unique_articles': 1, 'unique_promotion_sessi...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM