[英]Pandas add new column with groupby based on values in 3 different columns
I have the following df:我有以下df:
Document Date Schedule Quantity Key
0 123 2020-12-02 1 20 1
1 123 2020-12-02 2 10 0
2 123 2020-12-02 3 5 0
3 456 2020-12-02 4 10 0
I want to add a new column: grouped by Document and Date, if the quantity in row 0 (where Key = 1) is different from the quantity in the column with the lowest value in schedule (excluding row 0) and where key = 0, New_Col = 1. If quantities are the same, New_Col = 0.我想添加一个新列:按文档和日期分组,如果第 0 行(其中 Key = 1)中的数量与计划中具有最低值(不包括第 0 行)且 key = 0 的列中的数量不同, New_Col = 1。如果数量相同,New_Col = 0。
Desired output:所需的 output:
Document Date Schedule Quantity Key New_Col
0 123 2020-12-02 1 20 1 1
1 123 2020-12-02 2 10 0 0
2 123 2020-12-02 3 5 0 0
3 456 2020-12-02 4 10 0 0
Define the following function:定义如下 function:
def getNewCol(grp):
rv = pd.Series(0, index=grp.index)
# Quantity from row with Key == 1 (a Series)
qn = grp.query('Key == 1').Quantity
if qn.size == 0: # Nothing found
return rv
qnK1 = qn.iloc[0] # The Quantity itself
# Min Schedule from "other" rows
schMin = grp.query('Key != 1').Schedule.min()
# Quantity from this row
qnMin = grp.query('Schedule == @schMin').Quantity.iloc[0]
if qnK1 != qnMin: # Different
rv.iloc[0] = 1 # Set the first element of the result
return rv
Then apply it and save the result in a new column:然后应用它并将结果保存在新列中:
df['New_Col'] = df.groupby(['Document', 'Date'], as_index=False)\
.apply(getNewCol).reset_index(level=0, drop=True)
The result is:结果是:
Document Date Schedule Quantity Key New_Col
0 123 2020-12-02 1 20 1 1
1 123 2020-12-02 2 10 0 0
2 123 2020-12-02 3 5 0 0
3 456 2020-12-02 4 10 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.