Pandas Dataframe：基于多个条件创建列

Question

Edited 编辑

I'm sorry I didn't post it right the first time. 抱歉，我第一次没有正确发布。 The solutions suggested worked if there were only two entries of each Lead ID with Lead Status "A". 如果每个Lead ID只有两个条目的Lead Status “ A”，则建议的解决方案有效。 I am changing my data. 我正在更改数据。 I again apologize. 我再次道歉。

Data: 数据：

Lead ID     Lead Status      Duration     Target
1-1H9C0XL   Too Small       -0.466177     1
1-1H9G33C   A               -0.620709     0 
1-1H9G33C   A               -0.500709     0
1-1H9G33C   A                0.337401     0
4-1HFORF8   No Fit          -0.343840     1
4-1HFSXOG   No Fit          -0.124920     1
4-1HLQ2IJ   A               -0.330962     0 
4-1HLQ2IJ   A                0.130818     0
4-1HLQ2IJ   A               -0.400817     0
4-1HLQ2IJ   A                0.240818     0

I want to accomplish following: 我要完成以下工作：

If there is a duplicate in the Lead ID and Lead Status , make all the Target values "1" for that LeadID with shorter Duration . 如果销售Lead ID和销售Lead Status重复， LeadID Duration较短的该LeadID所有Target都设置为“ 1”。

Desired Output 期望的输出

Lead ID     Lead Status      Duration     Target
1-1H9C0XL   Too Small       -0.466177     1
1-1H9G33C   A               -0.620709     1 
1-1H9G33C   A               -0.500709     1
1-1H9G33C   A                0.337401     0
4-1HFORF8   No Fit          -0.343840     1
4-1HFSXOG   No Fit          -0.124920     1
4-1HLQ2IJ   A               -0.330962     1 
4-1HLQ2IJ   A                0.130818     1
4-1HLQ2IJ   A               -0.400817     1
4-1HLQ2IJ   A                0.240818     0

I am not able to implement a condition of checking for duplicates and the value in duration to update the last column. 我无法实现检查重复项和持续时间值以更新最后一列的条件。 I appreciate any assistance a lot. 非常感谢您的协助。

Answer 1

Try this(assuming your df is sorted) 试试这个（假设您的df已排序）

df.loc[df[df.duplicated(['LeadID','LeadStatus'],keep=False)].drop_duplicates(['LeadID','LeadStatus'],keep='first').index,'Target']=1
df
Out[895]: 
      LeadID LeadStatus  Duration  Target
0  1-1H9C0XL   TooSmall    -0.466       1
1  1-1H9G33C          A    -0.621       1
2  1-1H9G33C          A     0.337       0
3  4-1HFORF8      NoFit    -0.344       1
4  4-1HFSXOG      NoFit    -0.125       1
5  4-1HLQ2IJ          A    -0.331       1
6  4-1HLQ2IJ          A     0.241       0

Update 更新

df=df.sort_values(['LeadID','LeadStatus','Duration'])

df.loc[df[df.duplicated(['LeadID','LeadStatus'],keep='last')].index,'Target']=1

Out[911]: 
      LeadID LeadStatus  Duration  Target
0  1-1H9C0XL   TooSmall    -0.466       1
1  1-1H9G33C          A    -0.621       1
2  1-1H9G33C          A    -0.501       1
3  1-1H9G33C          A     0.337       0
4  4-1HFORF8      NoFit    -0.344       1
5  4-1HFSXOG      NoFit    -0.125       1
8  4-1HLQ2IJ          A    -0.401       1
6  4-1HLQ2IJ          A    -0.331       1
7  4-1HLQ2IJ          A     0.131       1
9  4-1HLQ2IJ          A     0.241       0

Answer 2

Here is an idiomatic and performant answer. 这是一个惯用且高效的答案。

df['Target'] += df.sort_values('Duration')\
                  .duplicated(subset=['Lead ID', 'Lead Status'], keep='last')

If you don't assume unique rows have a 1 then you can do the following. 如果您不假设唯一行的值为1，则可以执行以下操作。

df1 = df.sort_values('Duration')
unique = ~df1.duplicated(subset=['Lead ID', 'Lead Status'], keep=False) * 1
first = df1.duplicated(subset=['Lead ID', 'Lead Status'], keep='last') * 1
df['Target'] = unique + first

And a less performant way: 和一种性能较低的方式：

df.groupby(['Lead ID', 'Lead Status'])['Duration']\
  .transform(lambda x: 1 if len(x) == 1 else x < x.max())

     Lead ID Lead Status  Duration  Target
0  1-1H9C0XL   Too Small -0.466177       1
1  1-1H9G33C           A -0.620709       1
2  1-1H9G33C           A -0.500709       0
3  1-1H9G33C           A  0.337401       1
4  4-1HFORF8      No Fit -0.343840       1
5  4-1HFSXOG      No Fit -0.124920       1
6  4-1HLQ2IJ           A -0.330962       1
7  4-1HLQ2IJ           A  0.130818       1
8  4-1HLQ2IJ           A -0.400817       1
9  4-1HLQ2IJ           A  0.240818       0

Pandas Dataframe：基于多个条件创建列

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-09-12 20:40:24

解决方案2
0 2017-09-12 20:34:07

Pandas Dataframe：基于多个条件创建列

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-09-12 20:40:24

解决方案2 0 2017-09-12 20:34:07

解决方案1
1 已采纳 2017-09-12 20:40:24

解决方案2
0 2017-09-12 20:34:07