简体   繁体   English

条件分组pandas DataFrame

[英]Conditional grouping pandas DataFrame

I have a DataFrame that has below columns:我有一个包含以下列的 DataFrame:

df = pd.DataFrame({'Name': ['Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy'], 
               'Lenght': ['10', '12.5', '11', '12.5', '12', '11', '12.5', '10', '5'],
              'Try': [0,0,0,1,1,1,2,2,2],
              'Batch':[0,0,0,0,0,0,0,0,0]})

In each batch a name gets arbitrary many tries to get the greatest lenght.在每个batch一个name会任意多次尝试以获得最大lenght. What I want to do is create a column win that has the value 1 for greatest lenght in a batch and 0 otherwise, with the following conditions.我想要做的是创建一个列win ,该列的值在batch最大lenght为 1,否则为 0,条件如下。

  • If one name hold the greatest lenght in a batch in multiple try only the first try will have the value 1 in win (See "Abe in example above")如果一个name在多次try中的批次中保持最大lenght ,则只有第一次trywin的值为 1(参见“上面示例中的 Abe”)

  • If two separate name holds equal greatest lenght then both will have value 1 in win如果两个单独的name保持相同的最大lenght则两者都将在win具有值 1

What I have managed to do so far is:到目前为止我设法做的是:

df.groupby(['Batch', 'name'])['lenght'].apply(lambda x: (x == x.max()).map({True: 1, False: 0}))

But it doesn't support all the conditions, any insight would be highly但它不支持所有条件,任何见解都将是高度

Expected outout:预期输出:

df = pd.DataFrame({'Name': ['Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy'], 
                   'Lenght': ['10', '12.5', '11', '12.5', '12', '11', '12.5', '10', '5'],
                  'Try': [0,0,0,1,1,1,2,2,2],
                  'Batch':[0,0,0,0,0,0,0,0,0],
                  'win':[0,1,0,1,0,0,0,0,0]})

appreciated.赞赏。 Many thanks,非常感谢,

Use GroupBy.transform for max values per groups compared by Lenght column by Series.eq for equality and for map to True->1 and False->0 cast values to integers by Series.astype :使用GroupBy.transformmax每通过比较组值Lenght柱通过Series.eq平等和在地图到True->1False->0铸造值由整数Series.astype

#added first row data by second row
df = pd.DataFrame({'Name': ['Karl', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy'], 
               'Lenght': ['12.5', '12.5', '11', '12.5', '12', '11', '12.5', '10', '5'],
              'Try': [0,0,0,1,1,1,2,2,2],
              'Batch':[0,0,0,0,0,0,0,0,0]})

df['Lenght'] = df['Lenght'].astype(float)


m1 = df.groupby('Batch')['Lenght'].transform('max').eq(df['Lenght'])

df1 = df[m1]
m2 = df1.groupby('Name')['Try'].transform('nunique').eq(1)
m3 = ~df1.duplicated(['Name','Batch'])

df['new'] = ((m2 | m3) & m1).astype(int)
print (df)
    Name  Lenght  Try  Batch  new
0   Karl    12.5    0      0    1
1   Karl    12.5    0      0    1
2  Billy    11.0    0      0    0
3    Abe    12.5    1      0    1
4   Karl    12.0    1      0    0
5  Billy    11.0    1      0    0
6    Abe    12.5    2      0    0
7   Karl    10.0    2      0    0
8  Billy     5.0    2      0    0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM