pandas使用concat基于单个列添加和重命名多个列

Question

I have this df: 我有这个df：

  group owner  failed granted_pe  slots
0    g1    u1       0     single      1
1   g50   u92       0     shared      8
2   g50   u92       0     shared      1

df can be created using this code: 可以使用以下代码创建df ：

df = pd.DataFrame([['g1', 'u1', 0, 'single', 1],
                   ['g50', 'u92', '0', 'shared', '8'],
                   ['g50', 'u92', '0', 'shared', '1']], 
                  columns=['group', 'owner', 'failed','granted_pe', 'slots'])
df = (df.astype(dtype={'group':'str', 'owner':'str','failed':'int', 'granted_pe':'str', 'slots':'int'}))
print(df)

Using groupby I create three columns calculated on the "slots" column: 使用groupby，我创建了三个在“ slots”列上计算的列：

df_calculated = pd.concat([
    df.loc[:,['group', 'slots']].groupby(['group']).sum(),
    df.loc[:,['group', 'slots']].groupby(['group']).mean(),
    df.loc[:,['group', 'slots']].groupby(['group']).max()
    ], axis=1)
print(df_calculated)

       slots  slots  slots
group                     
g1         1    1.0      1
g50        9    4.5      8

Issue 1 : Naming the new columns appropriately 问题1 ：适当命名新列
Can I add an argument to concat to name these columns "slots_sum", "slots_avg", and "slots_max"? 我可以在concat中添加参数以将这些列命名为“ slots_sum”，“ slots_avg”和“ slots_max”吗？

Issue 2 : Add columns to df 问题2 ：将列添加到df
I would prefer to add the new columns to the df just to the right of the "source" column ("slots" in this case). 我希望将新列添加到df的“源”列（在本例中为“插槽”）的右侧。 Desired output would look something like this: 所需的输出如下所示：

  group owner  failed granted_pe  slots  slots_sum  slots_avg  slots_max
0    g1    u1       0     single      1          1        1.0          1
1   g50   u92       0     shared      8          9        4.5          8
2   g50   u92       0     shared      1

My actual df is 4.5 mil rows, 23 cols. 我的实际df是450万行（23列）。 I will want to do something similar for other columns. 我将对其他专栏做类似的事情。

Answer 1

Using agg with add_prefix then merge it back 将agg与add_prefix使用，然后merge其merge回去

yourdf=df.merge(df.groupby('group')['slots'].agg(['sum','mean','max']).add_prefix('slots_').reset_index(),how='left')
Out[86]: 
  group owner  failed    ...     slots_sum  slots_mean  slots_max
0    g1    u1       0    ...             1         1.0          1
1   g50   u92       0    ...             9         4.5          8
2   g50   u92       0    ...             9         4.5          8

Answer 2

Another way is to use keys parameter in pd.concat then merge multiindex column headers 另一种方法是在pd.concat中使用keys参数，然后合并multiindex列标题

df = pd.DataFrame([['g1', 'u1', 0, 'single', 1],
                   ['g50', 'u92', '0', 'shared', '8'],
                   ['g50', 'u92', '0', 'shared', '1']], 
                  columns=['group', 'owner', 'failed','granted_pe', 'slots'])
df = (df.astype(dtype={'group':'str', 'owner':'str','failed':'int', 'granted_pe':'str', 'slots':'int'}))

df_calculated = pd.concat([
    df.loc[:,['group', 'slots']].groupby(['group']).sum(),
    df.loc[:,['group', 'slots']].groupby(['group']).mean(),
    df.loc[:,['group', 'slots']].groupby(['group']).max()
    ], axis=1, keys=['sum','mean','max'])
df_calculated.columns = [f'{j}_{i}' for i,j in df_calculated.columns]
print(df_calculated)

Output: 输出：

       slots_sum  slots_mean  slots_max
group                                  
g1             1         1.0          1
g50            9         4.5          8

pandas使用concat基于单个列添加和重命名多个列

问题描述

2 个解决方案

解决方案1
4 已采纳 2019-03-12 18:36:35

解决方案2
2 2019-03-12 18:44:24

pandas使用concat基于单个列添加和重命名多个列

问题描述

2 个解决方案

解决方案1 4 已采纳 2019-03-12 18:36:35

解决方案2 2 2019-03-12 18:44:24

解决方案1
4 已采纳 2019-03-12 18:36:35

解决方案2
2 2019-03-12 18:44:24