[英]Aggregation in pandas dataframe on two separate columns
I am trying to do aggregation on fields cat1, cat2, cat3
on the following DataFrame. 我正在尝试在以下DataFrame上的字段
cat1, cat2, cat3
上进行聚合。 I need to count the number of trials
and the number of unique subjects
in each group. 我需要
count the number of trials
每组count the number of trials
number of unique subjects
的number of unique subjects
。 The code below does find the number of trials correct but the number of subject is not correct. 下面的代码确实找到了正确的试验次数,但是受试者的数目不正确。
mydata = pd.DataFrame(np.array([
['Adam', 1L, 1L, 1L, 1L],
['Adam', 2L, 1L, 2L, 1L],
['Adam', 3L, 2L, 2L, 3L],
['Adam', 1L, 1L, 1L, 1L],
['Adam', 2L, 1L, 1L, 2L],
['Adam', 3L, 1L, 2L, 1L],
['Bob', 1L, 1L, 2L, 3L],
['Bob', 2L, 1L, 2L, 3L],
['Bob', 3L, 1L, 1L, 1L],
['Bob', 1L, 1L, 2L, 3L],
['Bob', 2L, 2L, 2L, 3L],
['Bob', 3L, 1L, 3L, 1L]], dtype=object),
columns = ['ID','trial','cat1','cat2','cat3']
)
grouped = mydata.groupby(['cat1', 'cat2', 'cat3']).agg(['count'])
grouped.reset_index()
Result: 结果:
cat1 cat2 cat3 ID trial
count count
0 1 1 1 3 3
1 1 1 2 1 1
2 1 2 1 2 2
3 1 2 3 3 3
4 1 3 1 1 1
5 2 2 3 2 2
The result that I am expecting is : 我期望的结果是:
cat1 cat2 cat3 trial ID
0 1 1 1 3 2
1 1 1 2 1 1
2 1 2 1 2 1
3 1 2 3 3 1
4 1 3 1 1 1
5 2 2 3 2 2
You could aggregate
on ID
with pd.Series.nunique
and get count
from trail
您可以使用
pd.Series.nunique
aggregate
ID
,并从trail
获取count
In [215]: (mydata.groupby(['cat1', 'cat2', 'cat3'])
.agg({'ID': pd.Series.nunique, 'trial': 'count'})
.reset_index())
Out[215]:
cat1 cat2 cat3 trial ID
0 1 1 1 3 2
1 1 1 2 1 1
2 1 2 1 2 1
3 1 2 3 3 1
4 1 3 1 1 1
5 2 2 3 2 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.