i have a dataframe
id name value flag
1 a x F
1 b y A
2 c z B
3 d m Q
if i want to groupby id and put value column into a new column as a list.
i can do
df.groupby('id')['value'].apply(list).reset_index()
is there any way where i can do groupby by 'id' but put 2 column's(name and value) into list.
my desired output
id col
1 [[a,x],[b,y]]
2 [[c,z]]
3 [[d,m]]
Convert columns to numpy array
by values
and then to list
s in groupby
or sepearately to new Series
:
df = df.groupby('id')
.apply(lambda x: x[['name','value']].values.tolist())
.reset_index(name='col')
print (df)
id col
0 1 [[a, x], [b, y]]
1 2 [[c, z]]
2 3 [[d, m]]
Or:
s = pd.Series(df[['name','value']].values.tolist(), index=df.index)
df = s.groupby(df['id']).apply(list).reset_index(name='col')
print (df)
id col
0 1 [[a, x], [b, y]]
1 2 [[c, z]]
2 3 [[d, m]]
Also if no problem with tuples in list
s:
s = pd.Series(list(zip(df['name'],df['value'])), index=df.index)
df = s.groupby(df['id']).apply(list).reset_index(name='col')
print (df)
id col
0 1 [(a, x), (b, y)]
1 2 [(c, z)]
2 3 [(d, m)]
Use zip
in apply
ie
df.groupby('id').apply(lambda x: list(zip(x['name'],x['value'])))
id
1 [(a, x), (b, y)]
2 [(c, z)]
3 [(d, m)]
dtype: object
To match your exact output use to_frame
and reset_index
ie
df.groupby('id').apply(lambda x: list(zip(x['name'],x['value']))).to_frame('col').reset_index()
id col
0 1 [(a, x), (b, y)]
1 2 [(c, z)]
2 3 [(d, m)]
You can use numpy's stack
function to convert the two columns to one column of lists, and then use pandas' own groupby
function.
Imports and building dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame(
[[1,'a','x','F'],
[1,'b','y','A'],
[2,'c','z','B'],
[3,'d','m','Q']],
columns=['id','name','value','flag']
).set_index('id')
The function:
df.assign(col=list(np.stack(df[['name','value']].values))) \
.groupby(level=0)['col'].apply(list).to_frame()
Which returns:
col
id
1 [[a, x], [b, y]]
2 [[c, z]]
3 [[d, m]]
Fixing a previous errant solution
df = pd.DataFrame({"i" : [i % 3 for i in range(20)], "x" : range(20), "y" : range(20)}) # Init a dummy dframe
df = df.groupby('i')\
.apply(lambda row: tuple(zip(row['x'], row['y'])))\
.reset_index()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.