简体   繁体   中英

Conditional grouping pandas DataFrame

I have a DataFrame that has below columns:

df = pd.DataFrame({'Name': ['Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy'], 
               'Lenght': ['10', '12.5', '11', '12.5', '12', '11', '12.5', '10', '5'],
              'Try': [0,0,0,1,1,1,2,2,2],
              'Batch':[0,0,0,0,0,0,0,0,0]})

In each batch a name gets arbitrary many tries to get the greatest lenght. What I want to do is create a column win that has the value 1 for greatest lenght in a batch and 0 otherwise, with the following conditions.

  • If one name hold the greatest lenght in a batch in multiple try only the first try will have the value 1 in win (See "Abe in example above")

  • If two separate name holds equal greatest lenght then both will have value 1 in win

What I have managed to do so far is:

df.groupby(['Batch', 'name'])['lenght'].apply(lambda x: (x == x.max()).map({True: 1, False: 0}))

But it doesn't support all the conditions, any insight would be highly

Expected outout:

df = pd.DataFrame({'Name': ['Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy'], 
                   'Lenght': ['10', '12.5', '11', '12.5', '12', '11', '12.5', '10', '5'],
                  'Try': [0,0,0,1,1,1,2,2,2],
                  'Batch':[0,0,0,0,0,0,0,0,0],
                  'win':[0,1,0,1,0,0,0,0,0]})

appreciated. Many thanks,

Use GroupBy.transform for max values per groups compared by Lenght column by Series.eq for equality and for map to True->1 and False->0 cast values to integers by Series.astype :

#added first row data by second row
df = pd.DataFrame({'Name': ['Karl', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy'], 
               'Lenght': ['12.5', '12.5', '11', '12.5', '12', '11', '12.5', '10', '5'],
              'Try': [0,0,0,1,1,1,2,2,2],
              'Batch':[0,0,0,0,0,0,0,0,0]})

df['Lenght'] = df['Lenght'].astype(float)


m1 = df.groupby('Batch')['Lenght'].transform('max').eq(df['Lenght'])

df1 = df[m1]
m2 = df1.groupby('Name')['Try'].transform('nunique').eq(1)
m3 = ~df1.duplicated(['Name','Batch'])

df['new'] = ((m2 | m3) & m1).astype(int)
print (df)
    Name  Lenght  Try  Batch  new
0   Karl    12.5    0      0    1
1   Karl    12.5    0      0    1
2  Billy    11.0    0      0    0
3    Abe    12.5    1      0    1
4   Karl    12.0    1      0    0
5  Billy    11.0    1      0    0
6    Abe    12.5    2      0    0
7   Karl    10.0    2      0    0
8  Billy     5.0    2      0    0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM