[英]Pandas groupby with new column for each value
I hope the title speaks for itself; 我希望这个标题不言自明; I'd just like to add that it can be assumed that each key has the same amount of values.
我只想补充一点,可以假设每个键具有相同数量的值。 Online searching the title yielded the following solution:
在线搜索标题产生了以下解决方案:
Split pandas dataframe based on groupby 基于groupby拆分pandas数据帧
Which supposed to be solving my problem, although it does not. 这应该是解决我的问题,虽然它没有。 I'll give an example:
我举个例子:
Input: 输入:
pd.DataFrame(data={'a':['foo','foo','foo','bar','bar','bar'],'b':[1,2,3,4,5,6]})
Output: 输出:
pd.DataFrame(data={'a':['foo','bar'],'b':[1,4],'c':[2,5],'d':[3,6]})
Intuitively, it would be a groupby
function without an aggregation function, or an aggregation function that makes a list out of the keys. 直观地说,它将是没有聚合函数的
groupby
函数,或者是从密钥中生成列表的聚合函数。
Obviously, it can be done 'manually' using for loops etc., but using for loops with large data sets is very expensive computationally. 显然,它可以使用for循环等“手动”完成,但是使用具有大数据集的for循环在计算上非常昂贵。
Use GroupBy.cumcount
for Series
or column g
, then reshape by DataFrame.set_index
+ Series.unstack
or DataFrame.pivot
, last data cleaning by DataFrame.add_prefix
, DataFrame.rename_axis
with DataFrame.reset_index
: 使用
GroupBy.cumcount
的Series
或列g
,然后通过重塑DataFrame.set_index
+ Series.unstack
或DataFrame.pivot
,由过去的数据清洗DataFrame.add_prefix
, DataFrame.rename_axis
与DataFrame.reset_index
:
g = df1.groupby('a').cumcount()
df = (df1.set_index(['a', g])['b']
.unstack()
.add_prefix('new_')
.reset_index()
.rename_axis(None, axis=1))
print (df)
a new_0 new_1 new_2
0 bar 4 5 6
1 foo 1 2 3
Or: 要么:
df1['g'] = df1.groupby('a').cumcount()
df = df1.pivot('a','g','b').add_prefix('new_').reset_index().rename_axis(None, axis=1)
print (df)
a new_0 new_1 new_2
0 bar 4 5 6
1 foo 1 2 3
Here is an alternative approach, using groupby.apply
and string.ascii_lowercase
if column names are important: 如果列名很重要,可以使用
groupby.apply
和string.ascii_lowercase
,这是另一种方法:
from string import ascii_lowercase
df = pd.DataFrame(data={'a':['foo','foo','foo','bar','bar','bar'],'b':[1,2,3,4,5,6]})
# Groupby 'a'
g = df.groupby('a')['b'].apply(list)
# Construct new DataFrame from g
new_df = pd.DataFrame(g.values.tolist(), index=g.index).reset_index()
# Fix column names
new_df.columns = [x for x in ascii_lowercase[:new_df.shape[1]]]
print(new_df)
a b c d
0 bar 4 5 6
1 foo 1 2 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.