熊猫分组两列并获得唯一计数

Question

I have the following dataframe:我有以下数据框：

   ID       hour                          
  3403       9
  3478       1
  3478       1
  3478       1
  3478       1
  3478       1
  3478       1
  3481       1
  3489       1
  3489       1
  3489       1
  3489       1
  3489       1
  3489       1
  3489       1
  3502       2
  3502       2
  3502       2

I want to get the unique count of ID's against each hours.我想获得每个小时的唯一 ID 计数。 Meaning, I want something like this:意思是，我想要这样的东西：

count     hour
  1        9
  3        1
  1        2

How can I do this?我怎样才能做到这一点？
All I have done so far is groupby both hour and ID, like this:到目前为止，我所做的只是对小时和 ID 进行分组，如下所示：

df.groupby(['hour', 'CONVERSATIONID'])

But doesnt know how to proceed further.但不知道如何进一步。

Answer 1

#input data
d = {'ID': [3403,3478,3478,3481,3502,3502], 'Hour': [9,1,1,1,2,2]}
df = pd.DataFrame(data=d)
#drop duplicates in ID column
df = df.drop_duplicates(subset=None, keep='first', inplace=False)
#group by Hour
df = df[['Hour', 'ID']].groupby(['Hour']).agg(['count'])

Answer 2

您可以简单地使用 group by 然后进行计数

df.groupby(['Hour','ID']).size().reset_index().groupby('Hour').Hour.value_counts()

Answer 3

这可能有效-

df.groupby(['hour']).agg(count=('ID', 'nunique')).reset_index()

熊猫分组两列并获得唯一计数

问题描述

3 个解决方案

解决方案1
0 2020-03-13 17:56:54

解决方案2
0 2020-03-13 18:07:25

解决方案3
0 2020-03-13 18:13:14

熊猫分组两列并获得唯一计数

问题描述

3 个解决方案

解决方案1 0 2020-03-13 17:56:54

解决方案2 0 2020-03-13 18:07:25

解决方案3 0 2020-03-13 18:13:14

解决方案1
0 2020-03-13 17:56:54

解决方案2
0 2020-03-13 18:07:25

解决方案3
0 2020-03-13 18:13:14