如何通过自定义功能对熊猫DataFrame进行分组

Question

具有以下形式的数据框

我想按col1的初始编号分组，应用均值

基本上结果应该是

col1    sum
8       1.5
3       3
7       5

我尝试过的是

def group_condition(col1):
    col1 = str(col1)
    if col1.startswith('8'):
        return 'y'
    else:
        return 'n'


augmented_error_table[[sum]].groupby(augmented_error_table[col1].groupby(group_condition).groups).mean()

但这行不通，请给我空的df

Answer 1

在groupby之类使用astype(str) 。

df.groupby(df['col1'].astype(str).str[0])['sum'].mean()

：

Answer 2

我认为问题在于， groupby实际上需要一个序列，而不是一个函数作为输入，像这样

table.groupby(group_condition(table[col1]))

Answer 3

import pandas as pd
import numpy as np

df = pd.DataFrame(dict(col1=[801,802,391,701], sum=[1,2,3,5]))
# work out initial digit by list comprehension
df['init_digit'] = [str(x)[0] for x in df.col1]
# use groupby, agg function apply to sum column only
df.groupby(['init_digit']).agg({'sum':mean})

Out[23]: 
            sum
init_digit     
3           3.0
7           5.0
8           1.5

如何通过自定义功能对熊猫DataFrame进行分组

问题描述

3 个解决方案

解决方案1
2 2017-10-29 02:45:19

解决方案2
0 2015-06-30 04:08:23

解决方案3
0 已采纳 2015-06-30 06:35:44

如何通过自定义功能对熊猫DataFrame进行分组

问题描述

3 个解决方案

解决方案1 2 2017-10-29 02:45:19

解决方案2 0 2015-06-30 04:08:23

解决方案3 0 已采纳 2015-06-30 06:35:44

解决方案1
2 2017-10-29 02:45:19

解决方案2
0 2015-06-30 04:08:23

解决方案3
0 已采纳 2015-06-30 06:35:44