[英]join or merge values calculated on grouped pandas dataframe
I have a DataFrame containing density values. 我有一个包含密度值的DataFrame。 I'd like to group by the 'hour' value, bin the densities, and add a new column to my original df, containing the bin number. 我想按“小时”值分组,对密度进行分类,然后将新列添加到我的原始df中,其中包含分类号。 This is failing, however: 但是,这失败了:
df = pd.DataFrame({
'hours': np.random.randint(0, 24, 10000),
'density' : np.random.sample(10000)})
def func(df):
""""calculates equal intervals of a series or array"""
intervals = pysal.esda.mapclassify.Equal_Interval(df.density, 5)
# yb is an ndarray containing the bin indices, 0 - 4 in this case
return intervals.yb
df['bins'] = df.groupby(df.hours).transform(func)
Gives AssertionError: length of join_axes must not be equal to 0
给出AssertionError: length of join_axes must not be equal to 0
If I just group the object and apply the interval function, it looks like this: 如果我只是将对象分组并应用interval函数,则它看起来像这样:
grp = df.groupby(df.hours).apply(func)
grp
Out[106]:
hours
0 [2, 4, 3, 4, 0, 4, 2, 2, 0, 1, 0, 0, 2, 2, 0, ...
1 [4, 1, 0, 4, 0, 2, 2, 3, 2, 3, 0, 3, 4, 3, 2, ...
2 [4, 1, 0, 2, 3, 4, 1, 1, 0, 3, 4, 4, 2, 4, 0, ...
3 [3, 0, 0, 4, 0, 0, 0, 1, 2, 2, 0, 2, 2, 2, 1, ...
4 [0, 1, 1, 2, 1, 3, 1, 3, 2, 2, 1, 4, 0, 4, 2, ...
5 [2, 0, 2, 1, 3, 1, 1, 0, 4, 4, 2, 1, 4, 1, 2, ...
6 [1, 2, 3, 3, 3, 2, 4, 1, 2, 1, 2, 0, 3, 2, 0, ...
7 [3, 0, 3, 1, 3, 1, 2, 1, 4, 2, 1, 2, 1, 1, 1, ...
8 [0, 1, 4, 3, 0, 1, 0, 0, 1, 0, 2, 1, 0, 1, 1, ...
9 [4, 2, 0, 4, 1, 3, 2, 3, 4, 1, 1, 4, 4, 4, 4, ...
10 [4, 4, 3, 3, 1, 2, 3, 0, 2, 4, 2, 4, 0, 2, 2, ...
11 [0, 1, 3, 0, 1, 1, 1, 1, 2, 1, 2, 0, 3, 3, 4, ...
12 [3, 1, 1, 0, 4, 4, 3, 0, 1, 2, 1, 1, 4, 2, 0, ...
13 [1, 1, 0, 2, 0, 1, 4, 1, 2, 2, 3, 1, 2, 0, 3, ...
14 [2, 4, 0, 2, 1, 2, 0, 4, 4, 2, 3, 4, 2, 1, 1, ...
15 [2, 4, 3, 4, 1, 0, 3, 1, 2, 0, 3, 4, 2, 2, 3, ...
16 [0, 4, 2, 3, 3, 4, 0, 3, 2, 0, 1, 0, 0, 2, 0, ...
17 [3, 1, 4, 4, 0, 4, 1, 0, 4, 3, 3, 2, 3, 1, 4, ...
18 [4, 3, 0, 2, 4, 2, 2, 0, 2, 2, 1, 2, 1, 0, 1, ...
19 [3, 0, 3, 1, 1, 0, 1, 1, 3, 3, 2, 3, 4, 0, 0, ...
20 [3, 0, 1, 4, 0, 0, 4, 2, 4, 2, 2, 0, 4, 0, 0, ...
21 [4, 2, 3, 3, 1, 2, 0, 4, 2, 0, 2, 2, 1, 2, 2, ...
22 [0, 4, 1, 1, 3, 1, 4, 1, 3, 4, 4, 0, 4, 4, 4, ...
23 [4, 1, 2, 0, 2, 0, 0, 0, 2, 3, 1, 1, 3, 0, 1, ...
dtype: object
Is there a standard way to join or merge values calculated from a grouped object, or should I be using transform
differently? 有没有一种标准的方法来合并或合并从一个分组对象计算出的值,或者我应该以不同的方式使用transform
?
Try to transform on column like this - 尝试像这样转换列-
df['bins'] = df.groupby(df.hours).density.transform(func)
Note: func needs to be changed to receive Series as arg 注意:需要更改func才能将Series作为arg接收
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.