[英]how to compare two date by iterating in a pandas data frame and create a new column
I have a pandas data frame with customer transactions as shown below and create a column named 'Label' with 2 different values我有一个 pandas 数据框,其中包含如下所示的客户交易,并创建一个名为“标签”的列,其中包含 2 个不同的值
New Transaction performed before the end date of the previous transaction在前一笔交易的结束日期之前执行的新交易
New Transaction performed after the end date of the previous transaction在上一个交易的结束日期之后执行的新交易
Input输入
Transaction ID Transaction Start Date Transaction End Date
1 23-jun-2014 15-Jul-2014
2 14-jul-2014 8-Aug-2014
3 13-Aug-2014 22-Aug-2014
4 21-Aug-2014 28-Aug-2014
5 29-Aug-2014 05-Sep-2014
6 06-Sep-2014 15-Sep-2014
Desired output所需 output
Transaction ID Transaction Start Date Transaction End Date Label
1 23-jun-2014 15-Jul-2014
2 14-jul-2014 8-Aug-2014 New Transaction performed before end date of previous transaction
3 13-Aug-2014 22-Aug-2014 New Transaction after the end date of previous transaction.
4 21-Aug-2014 28-Aug-2014 New Transaction performed before the end date of previous transaction.
5 29-Aug-2014 05-Sep-2014 New Transaction after the end date of previous transaction.
6 06-Sep-2014 15-Sep-2014 New Transaction after the end date of previous transaction.
Use numpy.where
and Series.shift
:使用
numpy.where
和Series.shift
:
import numpy as np
df['Label'] = np.where(df['Transaction Start Date'].lt(df['Transaction End Date'].shift()), 'New Transaction performed before end date of previous transaction', 'New Transaction after the end date of previous transaction.')
Use to_datetime
first, then numpy.where
with Series.lt
form less compred shifted values by Series.shift
and last set first value to empty string:首先使用
to_datetime
,然后使用Series.shift
和numpy.where
通过Series.lt
形成较少压缩的移位值,最后将第一个值设置为空字符串:
df['Transaction End Date'] = pd.to_datetime(df['Transaction End Date'])
df['Transaction Start Date'] = pd.to_datetime(df['Transaction Start Date'])
df['Label'] = np.where(df['Transaction Start Date'].lt(df['Transaction End Date'].shift()),
'New Transaction performed before end date of previous transaction',
'New Transaction after the end date of previous transaction.')
df.loc[0, 'Label'] = ''
Alternative solution:替代解决方案:
m = df['Transaction Start Date'].lt(df['Transaction End Date'].shift())
df['Label'] = [''] + np.where(m,
'New Transaction performed before end date of previous transaction',
'New Transaction after the end date of previous transaction.')[1:].tolist()
print (df)
Transaction ID Transaction Start Date Transaction End Date \
0 1 2014-06-23 2014-07-15
1 2 2014-07-14 2014-08-08
2 3 2014-08-13 2014-08-22
3 4 2014-08-21 2014-08-28
4 5 2014-08-29 2014-09-05
5 6 2014-09-06 2014-09-15
Label
1 New Transaction performed before end date of p...
2 New Transaction after the end date of previous...
3 New Transaction performed before end date of p...
4 New Transaction after the end date of previous...
5 New Transaction after the end date of previous...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.