Pandas groupby为每个值添加新列

Question

I hope the title speaks for itself; 我希望这个标题不言自明; I'd just like to add that it can be assumed that each key has the same amount of values. 我只想补充一点，可以假设每个键具有相同数量的值。 Online searching the title yielded the following solution: 在线搜索标题产生了以下解决方案：

Split pandas dataframe based on groupby 基于groupby拆分pandas数据帧

Which supposed to be solving my problem, although it does not. 这应该是解决我的问题，虽然它没有。 I'll give an example: 我举个例子：

Input: 输入：

pd.DataFrame(data={'a':['foo','foo','foo','bar','bar','bar'],'b':[1,2,3,4,5,6]})

Output: 输出：

pd.DataFrame(data={'a':['foo','bar'],'b':[1,4],'c':[2,5],'d':[3,6]})

Intuitively, it would be a groupby function without an aggregation function, or an aggregation function that makes a list out of the keys. 直观地说，它将是没有聚合函数的groupby函数，或者是从密钥中生成列表的聚合函数。

Obviously, it can be done 'manually' using for loops etc., but using for loops with large data sets is very expensive computationally. 显然，它可以使用for循环等“手动”完成，但是使用具有大数据集的for循环在计算上非常昂贵。

Answer 1

Use GroupBy.cumcount for Series or column g , then reshape by DataFrame.set_index + Series.unstack or DataFrame.pivot , last data cleaning by DataFrame.add_prefix , DataFrame.rename_axis with DataFrame.reset_index : 使用GroupBy.cumcount的Series或列g ，然后通过重塑DataFrame.set_index + Series.unstack或DataFrame.pivot ，由过去的数据清洗DataFrame.add_prefix ， DataFrame.rename_axis与DataFrame.reset_index ：

g = df1.groupby('a').cumcount()
df = (df1.set_index(['a', g])['b']
         .unstack()
         .add_prefix('new_')
         .reset_index()
         .rename_axis(None, axis=1))
print (df)
     a  new_0  new_1  new_2
0  bar      4      5      6
1  foo      1      2      3

Or: 要么：

df1['g'] = df1.groupby('a').cumcount()
df = df1.pivot('a','g','b').add_prefix('new_').reset_index().rename_axis(None, axis=1)
print (df)
     a  new_0  new_1  new_2
0  bar      4      5      6
1  foo      1      2      3

Answer 2

Here is an alternative approach, using groupby.apply and string.ascii_lowercase if column names are important: 如果列名很重要，可以使用groupby.apply和string.ascii_lowercase ，这是另一种方法：

from string import ascii_lowercase

df = pd.DataFrame(data={'a':['foo','foo','foo','bar','bar','bar'],'b':[1,2,3,4,5,6]})

# Groupby 'a'
g = df.groupby('a')['b'].apply(list)

# Construct new DataFrame from g
new_df = pd.DataFrame(g.values.tolist(), index=g.index).reset_index()

# Fix column names
new_df.columns = [x for x in ascii_lowercase[:new_df.shape[1]]]

print(new_df)

     a  b  c  d
0  bar  4  5  6
1  foo  1  2  3

Pandas groupby为每个值添加新列

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-04-07 12:19:08

解决方案2
1 2019-04-07 12:53:10

Pandas groupby为每个值添加新列

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-04-07 12:19:08

解决方案2 1 2019-04-07 12:53:10

解决方案1
2 已采纳 2019-04-07 12:19:08

解决方案2
1 2019-04-07 12:53:10