在熊猫数据框的分组依据中获得唯一计数和最大值

Question

Using Pandas data frame group by feature and I want to group by column c_b and (1) calculate unique count for column c_a and column c_c , (2) and get the max value of column c_d. 使用Pandas数据帧按功能分组，我想按列c_b ，（1）计算列c_a和列c_c唯一计数，（2）并获取列c_d的最大值。 Wondering if there is any solution to write one line of group by code to achieve both goals? 想知道是否有解决方案可以按代码编写一行代码来实现两个目标？ I tried the following line of code, but it seems not correct. 我尝试了以下代码行，但似乎不正确。

sampleGroup = sample.groupby('c_b')(['c_a', 'c_d'].agg(pd.Series.nunique), ['c_d'].agg(pd.Series.max))

My expected results are, 我的预期结果是

Expected results , 预期结果 ，

c_b,c_a_unique_count,c_c_unique_count,c_d_max
python,2,2,1.0
c++,2,2,0.0

Thanks. 谢谢。

Input file , 输入文件

c_a,c_b,c_c,c_d
hello,python,numpy,0.0
hi,python,pandas,1.0
ho,c++,vector,0.0
ho,c++,std,0.0
go,c++,std,0.0

Source code , 源代码

sample = pd.read_csv('123.csv', header=None, skiprows=1,
    dtype={0:str, 1:str, 2:str, 3:float})
sample.columns = pd.Index(data=['c_a', 'c_b', 'c_c', 'c_d'])
sample['c_d'] = sample['c_d'].astype('int64')
sampleGroup = sample.groupby('c_b')(['c_a', 'c_d'].agg(pd.Series.nunique), ['c_d'].agg(pd.Series.max))
results.to_csv(sampleGroup, index= False)

Answer 1

You can pass a dict to agg() : 您可以将字典传递给agg() ：

df.groupby('c_b').agg({'c_a':'nunique', 'c_c':'nunique', 'c_d':'max'})

If you don't want c_b as index, you can pass as_index=False to groupby : 如果您不希望c_b作为索引，则可以将as_index=False传递给groupby ：

df.groupby('c_b', as_index=False).agg({'c_a':'nunique', 'c_c':'nunique', 'c_d':'max'})

在熊猫数据框的分组依据中获得唯一计数和最大值

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-08-28 01:50:19

在熊猫数据框的分组依据中获得唯一计数和最大值

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-08-28 01:50:19

解决方案1
3 已采纳 2016-08-28 01:50:19