[英]How to get minimum number of occurrences of value in pandas groupby
home_team_name home_team_goal_count
0 Bayern München 2
1 Bayern München 2
2 Bayern München 1
3 Köln 2
4 Köln 2
I groupby the data on the variable home_team_name.我将变量 home_team_name 上的数据分组。
df.groupby("home_team_name")
The values of home_team_goal_count
can only be 2 or 1. I want to get the minimum number of occurrences of the values in each group. home_team_goal_count
的值只能是 2 或 1。我想获取每个组中值的最小出现次数。 The result I would want is 1 for Bayern Munchen and 0 for Koln.我想要的结果是 1 代表拜仁慕尼黑,0 代表科隆。 To illustrate Bayern Munchen has 2 times 2 and 1 times 1, therefore the minimum is 1. Koln has 2 times 2 and 0 time 1 therefore the minimum is 0.为了说明拜仁慕尼黑有 2 次 2 和 1 次 1,因此最小值为 1。科隆有 2 次 2 和 0 次 1,因此最小值为 0。
First count values by SeriesGroupBy.value_counts
, reshape and add 0
for all combinations 1,2
and last get minimum by min
: SeriesGroupBy.value_counts
的第一个计数值,为所有组合1,2
重塑并添加0
,最后通过min
获得最小值:
s = (df.groupby("home_team_name")['home_team_goal_count']
.value_counts()
.unstack(fill_value=0)
.min(axis=1))
print (s)
home_team_name
Bayern München 1
Köln 0
dtype: int64
Details :详情:
print (df.groupby("home_team_name")['home_team_goal_count']
.value_counts()
.unstack(fill_value=0))
home_team_goal_count 1 2
home_team_name
Bayern München 1 2
Köln 0 2
If possible only 1
or only 2
values in input data is necessary reindex
:如果可能的话,输入数据中只有1
或只有2
值是必要的reindex
:
s = (df.groupby("home_team_name")['home_team_goal_count']
.value_counts()
.unstack(fill_value=0)
.reindex([1, 2], axis=1, fill_value=0)
.min(axis=1))
Let's try using pd.crosstab
:让我们尝试使用pd.crosstab
:
pd.crosstab(df['home_team_name'], df['home_team_goal_count'])\
.reindex([1, 2], axis=1, fill_value=0).min(1)
Result:结果:
home_team_name
Bayern München 1
Köln 0
dtype: int64
import pandas as pd
import numpy as np
list1=['Bayern Munchen','Bayern Munchen','Bayern Munchen','FC Koln','FC Koln']
list2=[2,2,1,2,2]
d={'Home Team Name':list1,'Home Team Goal Count':list2}
data=pd.DataFrame(d)
data['Name']= data['Home Team Name'] +" "+ data['Home Team Goal Count'].astype(str)
data['Name']
Out[39]:
0 Bayern Munchen 2
1 Bayern Munchen 2
2 Bayern Munchen 1
3 FC Koln 2
4 FC Koln 2
name,count=np.unique(data['Name'].tolist(),return_counts=True)
name=[' '.join(x.split(' ')[:-1]) for x in name]
name
Out[99]: ['Bayern Munchen', 'Bayern Munchen', 'FC Koln']
min_val=pd.DataFrame({"Name":name,"Count":count})
name=[]
min_val_count=[]
for x in min_val.Name.unique():
name.append(min_val[min_val.Name!=x].min()[0])
if min_val[min_val.Name!=x].min()[1]==2:
min_val_count.append(0)
else:
min_val_count.append(min_val[min_val.Name!=x].min()[1])
minimum_val_dict=dict(zip(name,min_val_count))
minimum_val_dict
Out[104]: {'FC Koln': 0, 'Bayern Munchen': 1}
A slightly longer version as compared to the answers above.与上面的答案相比,版本稍长。
Even another way to do this would be to use a cateorical variable, since there's a finite set of states.甚至另一种方法是使用分类变量,因为有一组有限的状态。 So:所以:
(
df
.astype({"home_team_goal_count": "category"})
.groupby("home_team_name")["home_team_goal_count"]
.apply(lambda x: x.value_counts().min())
)
If you want to know which value occurred the least, you can call .idxmin()
instead of .min()
.如果您想知道哪个值出现最少,可以调用.idxmin()
而不是.min()
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.