I have the following data frame df
:
id stage days
--------------------
a1 A 1
a2 A 3
a3 A 2
a4 A 5
a1 B 1
a2 B 2
a1 C 2
a3 D 3
I applied a lambda function on a Pandas GroupBy
object. I created a list by aggregating the stage
column for each id. The code works fine.
df1 = df.groupby('id').apply(lambda x: list(x['stage'])).reset_index()
df1
The output looks like:
a1 [A, B, C]
a2 [A, B]
a3 [A, D]
a4 [A]
Now I want to create a list of tuple for each id
group. The touple is (stage, days)
. I modified the above code like below:
df2 = df.groupby('id').apply(lambda x:list((x['stage'],x['days']))).reset_index()
df2
I want the df2
to be like:
a1 [(A, 1), (B, 1), (C, 2)]
a2 [(A, 3), (B, 2)]
a3 [(A, 2), (D, 3)]
a4 [(A, 5)]
However, this output gives me only first (stage, days)
of each id, and it doesn't look like a tuple:
a1 [[A], [1]]
a2 [[A], [3]]
a3 [[A], [2]]
a4 [[A], [5]]
Did I miss anything here? Thanks!
Here's a minimal example. I think you want something like this:
df1 = pd.DataFrame({'A1':['a','b','c','a','b'],'B':[3,5,7,8,9], 'C':[10,20,30,40,50]})
df1.groupby('A1').apply(lambda df: list(zip(df['B'],df['C'])))
A1
a [(3, 10), (8, 40)]
b [(5, 20), (9, 50)]
c [(7, 30)]
Data from Manish , create the tuple column out side apply should be faster
df1['New']=list(zip(df1.B,df1.C))
df1
Out[1132]:
A1 B C New
0 a 3 10 (3, 10)
1 b 5 20 (5, 20)
2 c 7 30 (7, 30)
3 a 8 40 (8, 40)
4 b 9 50 (9, 50)
df1.groupby('A1').New.apply(list)
Out[1133]:
A1
a [(3, 10), (8, 40)]
b [(5, 20), (9, 50)]
c [(7, 30)]
Name: New, dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.