连接或合并在分组的熊猫数据帧上计算的值

Question

我有一个包含密度值的DataFrame。 我想按“小时”值分组，对密度进行分类，然后将新列添加到我的原始df中，其中包含分类号。 但是，这失败了：

df = pd.DataFrame({
    'hours': np.random.randint(0, 24, 10000),
    'density' : np.random.sample(10000)})

def func(df):
    """"calculates equal intervals of a series or array"""
    intervals = pysal.esda.mapclassify.Equal_Interval(df.density, 5)
    # yb is an ndarray containing the bin indices, 0 - 4 in this case 
    return intervals.yb

df['bins'] = df.groupby(df.hours).transform(func)

给出AssertionError: length of join_axes must not be equal to 0

如果我只是将对象分组并应用interval函数，则它看起来像这样：

grp = df.groupby(df.hours).apply(func)
grp

Out[106]:
hours
0        [2, 4, 3, 4, 0, 4, 2, 2, 0, 1, 0, 0, 2, 2, 0, ...
1        [4, 1, 0, 4, 0, 2, 2, 3, 2, 3, 0, 3, 4, 3, 2, ...
2        [4, 1, 0, 2, 3, 4, 1, 1, 0, 3, 4, 4, 2, 4, 0, ...
3        [3, 0, 0, 4, 0, 0, 0, 1, 2, 2, 0, 2, 2, 2, 1, ...
4        [0, 1, 1, 2, 1, 3, 1, 3, 2, 2, 1, 4, 0, 4, 2, ...
5        [2, 0, 2, 1, 3, 1, 1, 0, 4, 4, 2, 1, 4, 1, 2, ...
6        [1, 2, 3, 3, 3, 2, 4, 1, 2, 1, 2, 0, 3, 2, 0, ...
7        [3, 0, 3, 1, 3, 1, 2, 1, 4, 2, 1, 2, 1, 1, 1, ...
8        [0, 1, 4, 3, 0, 1, 0, 0, 1, 0, 2, 1, 0, 1, 1, ...
9        [4, 2, 0, 4, 1, 3, 2, 3, 4, 1, 1, 4, 4, 4, 4, ...
10       [4, 4, 3, 3, 1, 2, 3, 0, 2, 4, 2, 4, 0, 2, 2, ...
11       [0, 1, 3, 0, 1, 1, 1, 1, 2, 1, 2, 0, 3, 3, 4, ...
12       [3, 1, 1, 0, 4, 4, 3, 0, 1, 2, 1, 1, 4, 2, 0, ...
13       [1, 1, 0, 2, 0, 1, 4, 1, 2, 2, 3, 1, 2, 0, 3, ...
14       [2, 4, 0, 2, 1, 2, 0, 4, 4, 2, 3, 4, 2, 1, 1, ...
15       [2, 4, 3, 4, 1, 0, 3, 1, 2, 0, 3, 4, 2, 2, 3, ...
16       [0, 4, 2, 3, 3, 4, 0, 3, 2, 0, 1, 0, 0, 2, 0, ...
17       [3, 1, 4, 4, 0, 4, 1, 0, 4, 3, 3, 2, 3, 1, 4, ...
18       [4, 3, 0, 2, 4, 2, 2, 0, 2, 2, 1, 2, 1, 0, 1, ...
19       [3, 0, 3, 1, 1, 0, 1, 1, 3, 3, 2, 3, 4, 0, 0, ...
20       [3, 0, 1, 4, 0, 0, 4, 2, 4, 2, 2, 0, 4, 0, 0, ...
21       [4, 2, 3, 3, 1, 2, 0, 4, 2, 0, 2, 2, 1, 2, 2, ...
22       [0, 4, 1, 1, 3, 1, 4, 1, 3, 4, 4, 0, 4, 4, 4, ...
23       [4, 1, 2, 0, 2, 0, 0, 0, 2, 3, 1, 1, 3, 0, 1, ...
dtype: object

有没有一种标准的方法来合并或合并从一个分组对象计算出的值，或者我应该以不同的方式使用transform ？

Answer 1

尝试像这样转换列-

df['bins'] = df.groupby(df.hours).density.transform(func)

注意：需要更改func才能将Series作为arg接收

连接或合并在分组的熊猫数据帧上计算的值

问题描述

1 个解决方案

解决方案1
0 已采纳 2014-02-25 16:12:47

连接或合并在分组的熊猫数据帧上计算的值

问题描述

1 个解决方案

解决方案1 0 已采纳 2014-02-25 16:12:47

解决方案1
0 已采纳 2014-02-25 16:12:47