简体   繁体   中英

pandas: apply : using lambda to create a list of tuple

I have the following data frame df :

    id  stage days
--------------------
    a1   A    1
    a2   A    3
    a3   A    2
    a4   A    5
    a1   B    1
    a2   B    2
    a1   C    2
    a3   D    3

I applied a lambda function on a Pandas GroupBy object. I created a list by aggregating the stage column for each id. The code works fine.

df1 = df.groupby('id').apply(lambda x: list(x['stage'])).reset_index() 
df1

The output looks like:

a1  [A, B, C]
a2  [A, B]
a3  [A, D]
a4  [A]

Now I want to create a list of tuple for each id group. The touple is (stage, days) . I modified the above code like below:

df2 = df.groupby('id').apply(lambda x:list((x['stage'],x['days']))).reset_index() 
df2

I want the df2 to be like:

a1  [(A, 1), (B, 1), (C, 2)]
a2  [(A, 3), (B, 2)]
a3  [(A, 2), (D, 3)]
a4  [(A, 5)]

However, this output gives me only first (stage, days) of each id, and it doesn't look like a tuple:

a1  [[A], [1]]
a2  [[A], [3]]
a3  [[A], [2]]
a4  [[A], [5]]

Did I miss anything here? Thanks!

Here's a minimal example. I think you want something like this:

df1 = pd.DataFrame({'A1':['a','b','c','a','b'],'B':[3,5,7,8,9], 'C':[10,20,30,40,50]})

df1.groupby('A1').apply(lambda df: list(zip(df['B'],df['C'])))

A1
a    [(3, 10), (8, 40)]
b    [(5, 20), (9, 50)]
c             [(7, 30)]

Data from Manish , create the tuple column out side apply should be faster

df1['New']=list(zip(df1.B,df1.C))
df1
Out[1132]: 
  A1  B   C      New
0  a  3  10  (3, 10)
1  b  5  20  (5, 20)
2  c  7  30  (7, 30)
3  a  8  40  (8, 40)
4  b  9  50  (9, 50)
df1.groupby('A1').New.apply(list)
Out[1133]: 
A1
a    [(3, 10), (8, 40)]
b    [(5, 20), (9, 50)]
c             [(7, 30)]
Name: New, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM