[英]Weighted time aggregation of pandas dataframe defined by two categorical columns
[英]Aggregation in pandas dataframe on two separate columns
我正在嘗試在以下DataFrame上的字段cat1, cat2, cat3
上進行聚合。 我需要count the number of trials
每組count the number of trials
number of unique subjects
的number of unique subjects
。 下面的代碼確實找到了正確的試驗次數,但是受試者的數目不正確。
mydata = pd.DataFrame(np.array([
['Adam', 1L, 1L, 1L, 1L],
['Adam', 2L, 1L, 2L, 1L],
['Adam', 3L, 2L, 2L, 3L],
['Adam', 1L, 1L, 1L, 1L],
['Adam', 2L, 1L, 1L, 2L],
['Adam', 3L, 1L, 2L, 1L],
['Bob', 1L, 1L, 2L, 3L],
['Bob', 2L, 1L, 2L, 3L],
['Bob', 3L, 1L, 1L, 1L],
['Bob', 1L, 1L, 2L, 3L],
['Bob', 2L, 2L, 2L, 3L],
['Bob', 3L, 1L, 3L, 1L]], dtype=object),
columns = ['ID','trial','cat1','cat2','cat3']
)
grouped = mydata.groupby(['cat1', 'cat2', 'cat3']).agg(['count'])
grouped.reset_index()
結果:
cat1 cat2 cat3 ID trial
count count
0 1 1 1 3 3
1 1 1 2 1 1
2 1 2 1 2 2
3 1 2 3 3 3
4 1 3 1 1 1
5 2 2 3 2 2
我期望的結果是:
cat1 cat2 cat3 trial ID
0 1 1 1 3 2
1 1 1 2 1 1
2 1 2 1 2 1
3 1 2 3 3 1
4 1 3 1 1 1
5 2 2 3 2 2
您可以使用pd.Series.nunique
aggregate
ID
,並從trail
獲取count
In [215]: (mydata.groupby(['cat1', 'cat2', 'cat3'])
.agg({'ID': pd.Series.nunique, 'trial': 'count'})
.reset_index())
Out[215]:
cat1 cat2 cat3 trial ID
0 1 1 1 3 2
1 1 1 2 1 1
2 1 2 1 2 1
3 1 2 3 3 1
4 1 3 1 1 1
5 2 2 3 2 2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.