简体   繁体   English

为每个索引应用pandas groupby

[英]Applying pandas groupby for each index

I have a dataframe with a person's name as the index (can have multiple entries) and two columns 'X' and 'Y'. 我有一个数据框,其中一个人的名字作为索引(可以有多个条目)和两列“X”和“Y”。 The columns 'X' and 'Y' can be any letter between AC. 列'X'和'Y'可以是AC之间的任何字母。

for example: 例如:

df = pd.DataFrame({'X' : ['A', 'B', 'A', 'C'], 'Y' : ['B', 'A', 'A', 'C']},index = ['Bob','Bob','John','Mike'])

For each person (ie index) I would like to get the number of occurrences of every unique combination of columns 'X' and 'Y' (for example - for Bob I have 1 count of ('A','B') and 1 count of ('B','A')). 对于每个人(即索引),我想得到列'X'和'Y'的每个唯一组合的出现次数(例如 - 对于Bob我有1个计数('A','B')和1计数('B','A'))。

When I do the following: 当我执行以下操作时:

df.loc['Bob'].groupby(['X','Y']).size() 

I get the correct results for Bob. 我得到鲍勃的正确结果。 How can I do this for each person without al oop? 如何在没有人的情况下为每个人这样做? Ideally, I would get a dataframe with the different people as index, every unique combination of columns 'X' and 'Y' as the columns and the number of times it appeared in the dataframe as the value. 理想情况下,我会得到一个数据框,其中不同的人作为索引,列的“X”和“Y”的每个唯一组合作为列以及它在数据框中作为值出现的次数。

    ('A','A') ('A','B') ('A','C') ('B','A') ... ('C','C')
Bob     0         1         0         1             0
John    1         0         0         0             0
Mike    0         0         0         0             1

using get_dummies and groupby 使用get_dummiesgroupby

pd.get_dummies(df.apply(tuple, 1)).groupby(level=0).sum()

      (A, A)  (A, B)  (B, A)  (C, C)
Bob        0       1       1       0
John       1       0       0       0
Mike       0       0       0       1

I think you can use: 我想你可以用:

#convert columns X and Y to tuples
df['tup'] = list(zip(df.X, df.Y))

#get size and reshape
df1 = df.reset_index().groupby(['index','tup']).size().unstack(fill_value=0)
print (df1)
tup    (A, A)  (A, B)  (B, A)  (C, C)
index                                
Bob         0       1       1       0
John        1       0       0       0
Mike        0       0       0       1

#get all unique combination
from  itertools import product
comb = list(product(df.X.unique(), df.Y.unique()))
print (comb)
[('A', 'B'), ('A', 'A'), ('A', 'C'), ('B', 'B'), ('B', 'A'), 
 ('B', 'C'), ('C', 'B'), ('C', 'A'), ('C', 'C')]

#reindex columns by this combination
print (df1.reindex(columns=comb, fill_value=0))
tup    (A, B)  (A, A)  (A, C)  (B, B)  (B, A)  (B, C)  (C, B)  (C, A)  (C, C)
index                                                                        
Bob         1       0       0       0       1       0       0       0       0
John        0       1       0       0       0       0       0       0       0
Mike        0       0       0       0       0       0       0       0       1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM