条件分组pandas DataFrame

Question

I have a DataFrame that has below columns:我有一个包含以下列的 DataFrame：

df = pd.DataFrame({'Name': ['Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy'], 
               'Lenght': ['10', '12.5', '11', '12.5', '12', '11', '12.5', '10', '5'],
              'Try': [0,0,0,1,1,1,2,2,2],
              'Batch':[0,0,0,0,0,0,0,0,0]})

In each batch a name gets arbitrary many tries to get the greatest lenght.在每个batch一个name会任意多次尝试以获得最大lenght. What I want to do is create a column win that has the value 1 for greatest lenght in a batch and 0 otherwise, with the following conditions.我想要做的是创建一个列win ，该列的值在batch最大lenght为 1，否则为 0，条件如下。

If one name hold the greatest lenght in a batch in multiple try only the first try will have the value 1 in win (See "Abe in example above")如果一个name在多次try中的批次中保持最大lenght ，则只有第一次try在win的值为 1（参见“上面示例中的 Abe”）
If two separate name holds equal greatest lenght then both will have value 1 in win如果两个单独的name保持相同的最大lenght则两者都将在win具有值 1

What I have managed to do so far is:到目前为止我设法做的是：

df.groupby(['Batch', 'name'])['lenght'].apply(lambda x: (x == x.max()).map({True: 1, False: 0}))

But it doesn't support all the conditions, any insight would be highly但它不支持所有条件，任何见解都将是高度

Expected outout:预期输出：

df = pd.DataFrame({'Name': ['Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy'], 
                   'Lenght': ['10', '12.5', '11', '12.5', '12', '11', '12.5', '10', '5'],
                  'Try': [0,0,0,1,1,1,2,2,2],
                  'Batch':[0,0,0,0,0,0,0,0,0],
                  'win':[0,1,0,1,0,0,0,0,0]})

appreciated.赞赏。 Many thanks,非常感谢，

Answer 1

Use GroupBy.transform for max values per groups compared by Lenght column by Series.eq for equality and for map to True->1 and False->0 cast values to integers by Series.astype :使用GroupBy.transform为max每通过比较组值Lenght柱通过Series.eq平等和在地图到True->1和False->0铸造值由整数Series.astype ：

#added first row data by second row
df = pd.DataFrame({'Name': ['Karl', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy'], 
               'Lenght': ['12.5', '12.5', '11', '12.5', '12', '11', '12.5', '10', '5'],
              'Try': [0,0,0,1,1,1,2,2,2],
              'Batch':[0,0,0,0,0,0,0,0,0]})

df['Lenght'] = df['Lenght'].astype(float)


m1 = df.groupby('Batch')['Lenght'].transform('max').eq(df['Lenght'])

df1 = df[m1]
m2 = df1.groupby('Name')['Try'].transform('nunique').eq(1)
m3 = ~df1.duplicated(['Name','Batch'])

df['new'] = ((m2 | m3) & m1).astype(int)
print (df)
    Name  Lenght  Try  Batch  new
0   Karl    12.5    0      0    1
1   Karl    12.5    0      0    1
2  Billy    11.0    0      0    0
3    Abe    12.5    1      0    1
4   Karl    12.0    1      0    0
5  Billy    11.0    1      0    0
6    Abe    12.5    2      0    0
7   Karl    10.0    2      0    0
8  Billy     5.0    2      0    0

条件分组pandas DataFrame

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-03-01 12:15:56

条件分组pandas DataFrame

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-03-01 12:15:56

解决方案1
2 已采纳 2020-03-01 12:15:56