简体   繁体   English

Pandas groupby为每个值添加新列

[英]Pandas groupby with new column for each value

I hope the title speaks for itself; 我希望这个标题不言自明; I'd just like to add that it can be assumed that each key has the same amount of values. 我只想补充一点,可以假设每个键具有相同数量的值。 Online searching the title yielded the following solution: 在线搜索标题产生了以下解决方案:

Split pandas dataframe based on groupby 基于groupby拆分pandas数据帧

Which supposed to be solving my problem, although it does not. 这应该是解决我的问题,虽然它没有。 I'll give an example: 我举个例子:

Input: 输入:

pd.DataFrame(data={'a':['foo','foo','foo','bar','bar','bar'],'b':[1,2,3,4,5,6]})

Output: 输出:

pd.DataFrame(data={'a':['foo','bar'],'b':[1,4],'c':[2,5],'d':[3,6]})

Intuitively, it would be a groupby function without an aggregation function, or an aggregation function that makes a list out of the keys. 直观地说,它将是没有聚合函数的groupby函数,或者是从密钥中生成列表的聚合函数。

Obviously, it can be done 'manually' using for loops etc., but using for loops with large data sets is very expensive computationally. 显然,它可以使用for循环等“手动”完成,但是使用具有大数据集的for循环在计算上非常昂贵。

Use GroupBy.cumcount for Series or column g , then reshape by DataFrame.set_index + Series.unstack or DataFrame.pivot , last data cleaning by DataFrame.add_prefix , DataFrame.rename_axis with DataFrame.reset_index : 使用GroupBy.cumcountSeries或列g ,然后通过重塑DataFrame.set_index + Series.unstackDataFrame.pivot ,由过去的数据清洗DataFrame.add_prefixDataFrame.rename_axisDataFrame.reset_index

g = df1.groupby('a').cumcount()
df = (df1.set_index(['a', g])['b']
         .unstack()
         .add_prefix('new_')
         .reset_index()
         .rename_axis(None, axis=1))
print (df)
     a  new_0  new_1  new_2
0  bar      4      5      6
1  foo      1      2      3

Or: 要么:

df1['g'] = df1.groupby('a').cumcount()
df = df1.pivot('a','g','b').add_prefix('new_').reset_index().rename_axis(None, axis=1)
print (df)
     a  new_0  new_1  new_2
0  bar      4      5      6
1  foo      1      2      3

Here is an alternative approach, using groupby.apply and string.ascii_lowercase if column names are important: 如果列名很重要,可以使用groupby.applystring.ascii_lowercase ,这是另一种方法:

from string import ascii_lowercase

df = pd.DataFrame(data={'a':['foo','foo','foo','bar','bar','bar'],'b':[1,2,3,4,5,6]})

# Groupby 'a'
g = df.groupby('a')['b'].apply(list)

# Construct new DataFrame from g
new_df = pd.DataFrame(g.values.tolist(), index=g.index).reset_index()

# Fix column names
new_df.columns = [x for x in ascii_lowercase[:new_df.shape[1]]]

print(new_df)

     a  b  c  d
0  bar  4  5  6
1  foo  1  2  3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM