[英]Group dataframe and aggregate data from several columns into a new column
I want to group this dataframe by column a
, and create a new column ( d
) with all values from both column b
and column c
.我想按
a
列对这个数据框进行分组,并创建一个新列 ( d
),其中包含来自b
列和c
列的所有值。
data_dict = {'a': list('aabbcc'),
'b': list('123456'),
'c': list('xxxyyy')}
df = pd.DataFrame(data_dict)
From this...由此...
to this对此
I've figured out one way of doing it,我想出了一种方法,
df['d'] = df['b'] + df['c']
df.groupby('a').agg({'d': lambda x: ','.join(x)})
but is there a more pandas way ?但是有更多的熊猫方式吗?
I think "more pandas" is hard to define, but you are able to groupby agg
directly on the series if you're trying to avoid the temp column:我认为“更多熊猫”很难定义,但是如果您试图避免使用临时列,则可以直接在系列上对
groupby agg
进行groupby agg
:
g = (df['b'] + df['c']).groupby(df['a']).agg(','.join).to_frame('d')
g
: g
:
d
a
a 1x,2x
b 3x,4y
c 5y,6y
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.