简体   繁体   English

熊猫-在MultiIndex DataFrame中使用最频繁的值进行堆叠

[英]pandas - unstack with most frequent values in MultiIndex DataFrame

I've got this sample DataFrame df : 我有这个样本DataFrame df

GridCode,User,DLang
3,224591119,es
3,224591119,ja
3,224591119,zh
4,224591119,es
6,146381773,en
9,17925282,ca

I would like to group the User field, keeping only the most frequent DLang code, then unstack and count the numbers of User in each GridCode . 我想对User字段进行分组,仅保留最频繁的DLang代码,然后对每个GridCodeUser数量进行GridCode和计数。 So far I did: 到目前为止,我做到了:

d = df.groupby(['GridCode','DLang']).size().unstack().fillna(0)

which correctly returns: 正确返回:

DLang     ca  en  es  ja  zh
GridCode                    
3          0   0   1   1   1
4          0   0   1   0   0
6          0   1   0   0   0
9          1   0   0   0   0

However, as you can see in df , some users have multiple DLang entries (eg User 224591119), but I only want to count their most frequent DLang code (eg for that user, it is es ). 但是,正如您在df看到的那样,某些用户具有多个DLang条目(例如,用户224591119),但是我只想计算他们最频繁的DLang代码(例如,对于该用户,它是es )。 The resulting dataframe would be: 结果数据框将是:

DLang     ca  en  es
GridCode                    
3          0   0   1
4          0   0   1
6          0   1   0
9          1   0   0

First, count how many times a specific DLang occurred, averaging across GridCode . 首先,计算特定DLang发生次数,取平均值为GridCode

g = df.groupby(['User','DLang']).count().reset_index()
g = g.rename(columns={'GridCode':'occurrences'})

Then, use the first() function to find the most frequent/max occurrence for each user. 然后,使用first()函数查找每个用户的最频繁/最大出现次数。

h = g.groupby('User').first().reset_index()

Merge just the most frequent/max occurrence df with the original input. 仅将最频繁/最大出现次数df与原始输入合并。 This will drop rows where users used a DLang other than the most frequent 这将删除用户使用DLang而不是最频繁的行

j = pd.merge(df,h, on=['User','DLang'])

Finally, average across users to get your final counts. 最后,对所有用户进行平均,以得出最终结果。

final_df = j.groupby(['GridCode','DLang']).size().unstack().fillna(0)

DLang     ca  en  es
GridCode            
3          0   0   1
4          0   0   1
6          0   1   0
9          1   0   0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM