[英]Conditional grouping pandas DataFrame
I have a DataFrame that has below columns:我有一个包含以下列的 DataFrame:
df = pd.DataFrame({'Name': ['Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy'],
'Lenght': ['10', '12.5', '11', '12.5', '12', '11', '12.5', '10', '5'],
'Try': [0,0,0,1,1,1,2,2,2],
'Batch':[0,0,0,0,0,0,0,0,0]})
In each batch
a name
gets arbitrary many tries to get the greatest lenght.
在每个batch
一个name
会任意多次尝试以获得最大lenght.
What I want to do is create a column win
that has the value 1 for greatest lenght
in a batch
and 0 otherwise, with the following conditions.我想要做的是创建一个列win
,该列的值在batch
最大lenght
为 1,否则为 0,条件如下。
If one name
hold the greatest lenght
in a batch in multiple try
only the first try
will have the value 1 in win
(See "Abe in example above")如果一个name
在多次try
中的批次中保持最大lenght
,则只有第一次try
在win
的值为 1(参见“上面示例中的 Abe”)
If two separate name
holds equal greatest lenght
then both will have value 1 in win
如果两个单独的name
保持相同的最大lenght
则两者都将在win
具有值 1
What I have managed to do so far is:到目前为止我设法做的是:
df.groupby(['Batch', 'name'])['lenght'].apply(lambda x: (x == x.max()).map({True: 1, False: 0}))
But it doesn't support all the conditions, any insight would be highly但它不支持所有条件,任何见解都将是高度
Expected outout:预期输出:
df = pd.DataFrame({'Name': ['Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy'],
'Lenght': ['10', '12.5', '11', '12.5', '12', '11', '12.5', '10', '5'],
'Try': [0,0,0,1,1,1,2,2,2],
'Batch':[0,0,0,0,0,0,0,0,0],
'win':[0,1,0,1,0,0,0,0,0]})
appreciated.赞赏。 Many thanks,非常感谢,
Use GroupBy.transform
for max
values per groups compared by Lenght
column by Series.eq
for equality and for map to True->1
and False->0
cast values to integers by Series.astype
:使用GroupBy.transform
为max
每通过比较组值Lenght
柱通过Series.eq
平等和在地图到True->1
和False->0
铸造值由整数Series.astype
:
#added first row data by second row
df = pd.DataFrame({'Name': ['Karl', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy', 'Abe', 'Karl', 'Billy'],
'Lenght': ['12.5', '12.5', '11', '12.5', '12', '11', '12.5', '10', '5'],
'Try': [0,0,0,1,1,1,2,2,2],
'Batch':[0,0,0,0,0,0,0,0,0]})
df['Lenght'] = df['Lenght'].astype(float)
m1 = df.groupby('Batch')['Lenght'].transform('max').eq(df['Lenght'])
df1 = df[m1]
m2 = df1.groupby('Name')['Try'].transform('nunique').eq(1)
m3 = ~df1.duplicated(['Name','Batch'])
df['new'] = ((m2 | m3) & m1).astype(int)
print (df)
Name Lenght Try Batch new
0 Karl 12.5 0 0 1
1 Karl 12.5 0 0 1
2 Billy 11.0 0 0 0
3 Abe 12.5 1 0 1
4 Karl 12.0 1 0 0
5 Billy 11.0 1 0 0
6 Abe 12.5 2 0 0
7 Karl 10.0 2 0 0
8 Billy 5.0 2 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.