简体   繁体   English

在 Pandas 中按列分组并计算每组中的唯一值

[英]Group by column in Pandas and count Unique values in each group

I'm trying to use groupby in pandas to group by a variable column and count the number of times a value shows up in the each group.我正在尝试在 pandas 中使用 groupby 来按变量列分组,并计算每个组中值出现的次数。

For example, using this group:例如,使用这个组:

d = {'Period': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4], 
'Result': ['True','True','False','True','False','True','False','True','True','False','False','True','False','True','False','False']}
df = pd.DataFrame(data=d)
df.sort_values(by=['Period'], inplace=True)
print(df)

I'd like to count how many times 'True' or 'False' shows up in each period.我想计算每个时期出现“真”或“假”的次数。 Outputting something like this:输出如下内容:

Period
1 : 2 True, 2 False
2 : 2 True, 1 False
3 : 0 True, 4 False
3 : 3 True, 1 False

The problem I'm having is that none of the methods in the examples I found do quite that.我遇到的问题是,我发现的示例中的方法都没有做到这一点。

.count() alone just counts the # of entries in each period .count()仅计算每个时期的条目数

.nunique() returns the number of unique entries .nunique()返回唯一条目的数量

.unique() returns the unique entries that exist but doesn't count them... .unique()返回存在但不计算它们的唯一条目...

If you run this full example code you'll see what I mean:如果你运行这个完整的示例代码,你会明白我的意思:

#create dataframe
d = {'Period': [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4], 
'Result': ['True','True','False','True','False','True','False','True','True','False','False','True','False','True','False','False']}
df = pd.DataFrame(data=d)
df.sort_values(by=['Period'], inplace=True)
print(df)

#group by and print counts 
print(df.groupby('Period')['Result'].count())
print(df.groupby('Period')['Result'].nunique())
print(df.groupby('Period')['Result'].unique())

Use pd.crosstab :使用pd.crosstab

print(pd.crosstab(df["Period"], df["Result"]))

Prints:印刷:

Result  False  True
Period             
1           2     2
2           1     3
3           4     0
4           1     3

Using collections.Counter :使用collections.Counter

df.groupby('Period')['Result'].apply(Counter).fillna(0).unstack()

output: output:

        True  False
Period             
1        2.0    2.0
2        3.0    1.0
3        0.0    4.0
4        3.0    1.0

Using value_counts :使用value_counts

df.groupby('Period')['Result'].value_counts().unstack().fillna(0)

output: output:

Result  False  True
Period             
1         2.0   2.0
2         1.0   3.0
3         4.0   0.0
4         1.0   3.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM