[英]Groupby one column and apply 2 columns into list pandas
i have a dataframe 我有一个数据帧
id name value flag
1 a x F
1 b y A
2 c z B
3 d m Q
if i want to groupby id and put value column into a new column as a list. 如果我想分组id并将值列作为列表放入新列。
i can do 我可以
df.groupby('id')['value'].apply(list).reset_index()
is there any way where i can do groupby by 'id' but put 2 column's(name and value) into list. 有没有办法我可以通过'id'进行groupby,但将2列(名称和值)放入列表中。
my desired output
id col
1 [[a,x],[b,y]]
2 [[c,z]]
3 [[d,m]]
Convert columns to numpy array
by values
and then to list
s in groupby
or sepearately to new Series
: 按
values
将列转换为numpy array
,然后将groupby
list
s或单独列出到新Series
:
df = df.groupby('id')
.apply(lambda x: x[['name','value']].values.tolist())
.reset_index(name='col')
print (df)
id col
0 1 [[a, x], [b, y]]
1 2 [[c, z]]
2 3 [[d, m]]
Or: 要么:
s = pd.Series(df[['name','value']].values.tolist(), index=df.index)
df = s.groupby(df['id']).apply(list).reset_index(name='col')
print (df)
id col
0 1 [[a, x], [b, y]]
1 2 [[c, z]]
2 3 [[d, m]]
Also if no problem with tuples in list
s: 如果
list
的元组没有问题:
s = pd.Series(list(zip(df['name'],df['value'])), index=df.index)
df = s.groupby(df['id']).apply(list).reset_index(name='col')
print (df)
id col
0 1 [(a, x), (b, y)]
1 2 [(c, z)]
2 3 [(d, m)]
Use zip
in apply
ie 使用
zip
in apply
ie
df.groupby('id').apply(lambda x: list(zip(x['name'],x['value'])))
id
1 [(a, x), (b, y)]
2 [(c, z)]
3 [(d, m)]
dtype: object
To match your exact output use to_frame
and reset_index
ie 要匹配您的确切输出,请使用
to_frame
和reset_index
ie
df.groupby('id').apply(lambda x: list(zip(x['name'],x['value']))).to_frame('col').reset_index()
id col
0 1 [(a, x), (b, y)]
1 2 [(c, z)]
2 3 [(d, m)]
You can use numpy's stack
function to convert the two columns to one column of lists, and then use pandas' own groupby
function. 您可以使用numpy的
stack
函数将两列转换为一列列表,然后使用pandas自己的groupby
函数。
Imports and building dataframe: 导入和构建数据框:
import pandas as pd
import numpy as np
df = pd.DataFrame(
[[1,'a','x','F'],
[1,'b','y','A'],
[2,'c','z','B'],
[3,'d','m','Q']],
columns=['id','name','value','flag']
).set_index('id')
The function: 功能:
df.assign(col=list(np.stack(df[['name','value']].values))) \
.groupby(level=0)['col'].apply(list).to_frame()
Which returns: 哪个回报:
col
id
1 [[a, x], [b, y]]
2 [[c, z]]
3 [[d, m]]
Fixing a previous errant solution 修复以前的错误解决方案
df = pd.DataFrame({"i" : [i % 3 for i in range(20)], "x" : range(20), "y" : range(20)}) # Init a dummy dframe
df = df.groupby('i')\
.apply(lambda row: tuple(zip(row['x'], row['y'])))\
.reset_index()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.