pandas中的新列 - 通过应用列表groupby将数组添加到数据框中

Question

Give the following df 给出以下df

  Id other  concat
0  A     z       1
1  A     y       2
2  B     x       3
3  B     w       4
4  B     v       5
5  B     u       6

I want the result with new column with grouped values as list 我希望结果包含new列，并将分组值作为列表

  Id other  concat           new
0  A     z       1        [1, 2]
1  A     y       2        [1, 2]
2  B     x       3  [3, 4, 5, 6]
3  B     w       4  [3, 4, 5, 6]
4  B     v       5  [3, 4, 5, 6]
5  B     u       6  [3, 4, 5, 6]

This is similar to these questions: 这与以下问题类似：

grouping rows in list in pandas groupby 在pandas groupby中对列表中的行进行分组

Replicating GROUP_CONCAT for pandas.DataFrame 为pandas.DataFrame复制GROUP_CONCAT

However, it is apply the grouping you get from df.groupby('Id')['concat'].apply(list) , which is a Series of smaller size than the dataframe, to the original dataframe. 但是，它将应用从df.groupby('Id')['concat'].apply(list)获得的分组df.groupby('Id')['concat'].apply(list)原始数据帧，这是一个比数据帧小的Series 。

I have tried the code below, but it does not apply this to the dataframe: 我已经尝试过以下代码，但它不适用于数据帧：

import pandas as pd
df = pd.DataFrame( {'Id':['A','A','B','B','B','C'], 'other':['z','y','x','w','v','u'], 'concat':[1,2,5,5,4,6]})
df.groupby('Id')['concat'].apply(list)

I know that transform can be used to apply groupings to dataframes, but it does not work in this case. 我知道transform可用于将分组应用于数据帧，但在这种情况下它不起作用。

>>> df['new_col'] = df.groupby('Id')['concat'].transform(list)
>>> df
  Id  concat other  new_col
0  A       1     z        1
1  A       2     y        2
2  B       5     x        5
3  B       5     w        5
4  B       4     v        4
5  C       6     u        6
>>> df['new_col'] = df.groupby('Id')['concat'].apply(list)
>>> df
  Id  concat other new_col
0  A       1     z     NaN
1  A       2     y     NaN
2  B       5     x     NaN
3  B       5     w     NaN
4  B       4     v     NaN
5  C       6     u     NaN

Answer 1

groupby with join groupby with join

df.join(df.groupby('Id').concat.apply(list).to_frame('new'), on='Id')

Answer 2

Less elegant (and slower..) solution, but let it be here just as an alternative. 不太优雅（和较慢..）的解决方案，但让它在这里作为替代。

def func(gr):
    gr['new'] = [list(gr.concat)] * len(gr.index)
    return gr
df.groupby('Id').apply(func)

%timeit df.groupby('Id').apply(func)
100 loops, best of 3: 4.18 ms per loop

%timeit df.join(df.groupby('Id').concat.apply(list).to_frame('new'), on='Id')
1000 loops, best of 3: 1.69 ms per loop

Answer 3

Use transform with [x.tolist()] or [x.values] 使用transform用[x.tolist()]或[x.values]

In [1396]: df.groupby('Id')['concat'].transform(lambda x: [x.tolist()])
Out[1396]:
0          [1, 2]
1          [1, 2]
2    [3, 4, 5, 6]
3    [3, 4, 5, 6]
4    [3, 4, 5, 6]
5    [3, 4, 5, 6]
Name: concat, dtype: object

In [1397]: df['new'] = df.groupby('Id')['concat'].transform(lambda x: [x.tolist()])

In [1398]: df
Out[1398]:
  Id other  concat           new
0  A     z       1        [1, 2]
1  A     y       2        [1, 2]
2  B     x       3  [3, 4, 5, 6]
3  B     w       4  [3, 4, 5, 6]
4  B     v       5  [3, 4, 5, 6]
5  B     u       6  [3, 4, 5, 6]

pandas中的新列 - 通过应用列表groupby将数组添加到数据框中

问题描述

3 个解决方案

解决方案1
5 已采纳 2016-11-04 23:09:20

解决方案2
3 2016-11-04 23:13:54

解决方案3
1 2017-10-13 03:32:31

pandas中的新列 - 通过应用列表groupby将数组添加到数据框中

问题描述

3 个解决方案

解决方案1 5 已采纳 2016-11-04 23:09:20

解决方案2 3 2016-11-04 23:13:54

解决方案3 1 2017-10-13 03:32:31

解决方案1
5 已采纳 2016-11-04 23:09:20

解决方案2
3 2016-11-04 23:13:54

解决方案3
1 2017-10-13 03:32:31