简体   繁体   English

Pandas Dataframe:基于多个条件创建列

[英]Pandas Dataframe: Creating column based on multiple conditions

Edited 编辑

I'm sorry I didn't post it right the first time. 抱歉,我第一次没有正确发布。 The solutions suggested worked if there were only two entries of each Lead ID with Lead Status "A". 如果每个Lead ID只有两个条目的Lead Status “ A”,则建议的解决方案有效。 I am changing my data. 我正在更改数据。 I again apologize. 我再次道歉。

Data: 数据:

Lead ID     Lead Status      Duration     Target
1-1H9C0XL   Too Small       -0.466177     1
1-1H9G33C   A               -0.620709     0 
1-1H9G33C   A               -0.500709     0
1-1H9G33C   A                0.337401     0
4-1HFORF8   No Fit          -0.343840     1
4-1HFSXOG   No Fit          -0.124920     1
4-1HLQ2IJ   A               -0.330962     0 
4-1HLQ2IJ   A                0.130818     0
4-1HLQ2IJ   A               -0.400817     0
4-1HLQ2IJ   A                0.240818     0

I want to accomplish following: 我要完成以下工作:

If there is a duplicate in the Lead ID and Lead Status , make all the Target values "1" for that LeadID with shorter Duration . 如果销售Lead ID和销售Lead Status重复, LeadID Duration较短的该LeadID所有Target都设置为“ 1”。

Desired Output 期望的输出

Lead ID     Lead Status      Duration     Target
1-1H9C0XL   Too Small       -0.466177     1
1-1H9G33C   A               -0.620709     1 
1-1H9G33C   A               -0.500709     1
1-1H9G33C   A                0.337401     0
4-1HFORF8   No Fit          -0.343840     1
4-1HFSXOG   No Fit          -0.124920     1
4-1HLQ2IJ   A               -0.330962     1 
4-1HLQ2IJ   A                0.130818     1
4-1HLQ2IJ   A               -0.400817     1
4-1HLQ2IJ   A                0.240818     0

I am not able to implement a condition of checking for duplicates and the value in duration to update the last column. 我无法实现检查重复项和持续时间值以更新最后一列的条件。 I appreciate any assistance a lot. 非常感谢您的协助。

Try this(assuming your df is sorted) 试试这个(假设您的df已排序)

df.loc[df[df.duplicated(['LeadID','LeadStatus'],keep=False)].drop_duplicates(['LeadID','LeadStatus'],keep='first').index,'Target']=1
df
Out[895]: 
      LeadID LeadStatus  Duration  Target
0  1-1H9C0XL   TooSmall    -0.466       1
1  1-1H9G33C          A    -0.621       1
2  1-1H9G33C          A     0.337       0
3  4-1HFORF8      NoFit    -0.344       1
4  4-1HFSXOG      NoFit    -0.125       1
5  4-1HLQ2IJ          A    -0.331       1
6  4-1HLQ2IJ          A     0.241       0

Update 更新


df=df.sort_values(['LeadID','LeadStatus','Duration'])

df.loc[df[df.duplicated(['LeadID','LeadStatus'],keep='last')].index,'Target']=1

Out[911]: 
      LeadID LeadStatus  Duration  Target
0  1-1H9C0XL   TooSmall    -0.466       1
1  1-1H9G33C          A    -0.621       1
2  1-1H9G33C          A    -0.501       1
3  1-1H9G33C          A     0.337       0
4  4-1HFORF8      NoFit    -0.344       1
5  4-1HFSXOG      NoFit    -0.125       1
8  4-1HLQ2IJ          A    -0.401       1
6  4-1HLQ2IJ          A    -0.331       1
7  4-1HLQ2IJ          A     0.131       1
9  4-1HLQ2IJ          A     0.241       0

Here is an idiomatic and performant answer. 这是一个惯用且高效的答案。

df['Target'] += df.sort_values('Duration')\
                  .duplicated(subset=['Lead ID', 'Lead Status'], keep='last')

If you don't assume unique rows have a 1 then you can do the following. 如果您不假设唯一行的值为1,则可以执行以下操作。

df1 = df.sort_values('Duration')
unique = ~df1.duplicated(subset=['Lead ID', 'Lead Status'], keep=False) * 1
first = df1.duplicated(subset=['Lead ID', 'Lead Status'], keep='last') * 1
df['Target'] = unique + first

And a less performant way: 和一种性能较低的方式:

df.groupby(['Lead ID', 'Lead Status'])['Duration']\
  .transform(lambda x: 1 if len(x) == 1 else x < x.max())

     Lead ID Lead Status  Duration  Target
0  1-1H9C0XL   Too Small -0.466177       1
1  1-1H9G33C           A -0.620709       1
2  1-1H9G33C           A -0.500709       0
3  1-1H9G33C           A  0.337401       1
4  4-1HFORF8      No Fit -0.343840       1
5  4-1HFSXOG      No Fit -0.124920       1
6  4-1HLQ2IJ           A -0.330962       1
7  4-1HLQ2IJ           A  0.130818       1
8  4-1HLQ2IJ           A -0.400817       1
9  4-1HLQ2IJ           A  0.240818       0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据其他行和列的多个条件在数据框中创建新列? 包括空行? - 蟒蛇/熊猫 - Creating a new column in dataframe based on multiple conditions from other rows and columns? Including rows that are null? - Python/Pandas Pandas DataFrame 根据多个条件分组添加新列值 - Pandas DataFrame add new column values based on group by multiple conditions Python:根据Python中的多个条件更改pandas DataFrame列中的值 - Python: Change values in a pandas DataFrame column based on multiple conditions in Python 根据熊猫数据框中的多个列值和条件替换值 - Replacing values based on multiple column values and conditions in pandas dataframe "如何根据多个条件估计 Pandas 数据框列值的计数?" - How to estimate count for Pandas dataframe column values based on multiple conditions? 根据多个不同的条件在 pandas 数据框中创建了一个新列 - created a new column in a pandas dataframe based on multiple different conditions 如何根据pandas dataframe中的多个条件反转列值? - How to reverse the column value based on multiple conditions in pandas dataframe? Pandas 条件创建一个dataframe列:基于多个条件 - Pandas conditional creation of a dataframe column: based on multiple conditions 在 Pandas 数据框中在多个条件下(基于 2 列)删除行 - Drop rows on multiple conditions (based on 2 column) in pandas dataframe 根据多列条件增加 pandas dataframe 中的单元格值 - Incrementing a cell value in pandas dataframe based on multiple column conditions
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM