[英]How do I check if a date in a date column is between two dates in different columns using pandas?
我想弄清楚如何檢查一列中的日期是否在兩個單獨列的日期之間。 我檢查了檢查日期列是否在日期范圍內 - 熊貓,但這不是完全相同的問題。
我使用每一行的唯一標識符來知道是否需要檢查該行的日期。
如果日期在兩個日期之間,我希望將該行的唯一標識符附加到列表中。
我在示例中使用“索引”作為唯一標識符。
import pandas as pd
date_dict = {'check_date': ['10/31/2019 10:00 PM',
'11/10/2012 06:02 PM',
'08/06/2008 02:02 PM',
'05/13/2009 12:19 PM',
'04/19/2008 07:38 PM',
'10/08/2012 01:12 PM',
'11/29/2012 09:41 AM',
'08/03/2016 02:05 AM',
'05/15/2015 12:31 AM',
'04/05/2016 10:21 AM',
'09/26/2018 02:02 PM',
'11/13/2014 02:09 AM',
'02/28/2014 09:58 AM',
'10/02/2015 08:25 PM',
'08/21/2008 06:31 AM',
'05/31/2017 03:48 AM',
'12/16/2010 10:39 PM',
'12/05/2008 08:57 AM',
'08/18/2010 10:35 PM',
'07/06/2010 12:25 AM',
'06/14/2013 07:27 AM',
'09/27/2015 11:06 PM',
'07/03/2014 01:02 AM',
'09/18/2009 04:26 PM',
'01/21/2016 10:56 PM'],
'start_date': ['02/24/2012 12:57 PM',
'09/25/2017 11:35 PM',
'07/05/2015 10:58 PM',
'04/26/2017 04:26 AM',
'09/03/2010 10:50 AM',
'07/08/2017 10:17 AM',
'06/14/2011 02:19 AM',
'03/21/2009 10:11 AM',
'10/22/2012 12:39 AM',
'11/09/2008 05:20 PM',
'12/31/2012 08:51 PM',
'08/26/2013 01:03 PM',
'05/21/2014 01:48 AM',
'11/11/2009 11:55 PM',
'04/23/2012 10:14 AM',
'11/23/2009 09:26 AM',
'08/20/2010 02:13 PM',
'08/09/2019 01:00 AM',
'01/06/2010 03:06 PM',
'02/23/2016 08:23 PM',
'10/30/2019 03:20 AM',
'06/12/2013 06:25 PM',
'02/03/2019 05:46 PM',
'08/07/2011 02:50 PM',
'06/18/2013 03:59 AM'],
'end_date': ['09/06/2014 03:03 AM',
'08/24/2012 12:30 PM',
'05/29/2008 05:48 AM',
'12/31/2014 01:00 AM',
'12/06/2011 05:47 PM',
'04/28/2013 07:01 PM',
'09/17/2017 02:21 AM',
'06/23/2008 03:45 PM',
'01/24/2011 03:04 PM',
'08/05/2015 02:10 AM',
'12/12/2018 11:50 AM',
'08/23/2016 06:31 AM',
'11/21/2018 08:49 AM',
'12/05/2009 03:31 PM',
'04/16/2010 09:24 PM',
'09/08/2012 12:29 PM',
'11/09/2009 08:08 AM',
'11/13/2016 04:21 AM',
'07/17/2018 12:05 PM',
'05/03/2012 06:27 AM',
'09/04/2012 09:11 PM',
'06/26/2014 06:55 AM',
'09/19/2016 08:48 PM',
'05/02/2018 09:03 AM',
'03/22/2015 04:20 AM']}
df = pd.DataFrame(date_dict)
df.reset_index(inplace = True)
df['flag'] = np.where(df['index'] % 2 == 0, 1, 0)
df_list = list(df[df['flag'] == 1]['index'])
analyst_list = []
for flag in df_list:
min_date = df[df['index'] == flag]['check_date']
for index, row in df.iterrows():
start = row['start_date']
end = row['end_date']
if min_date > start and min_date <= end :
analyst_list.append(row['index'])
else:
pass
當我運行上面的代碼時,我收到以下錯誤,我無法過去。
Traceback (most recent call last):
File "<ipython-input-112-fecfeaa05d6d>", line 8, in <module>
if min_date > start and min_date <= end :
File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\stan_env\lib\site-packages\pandas\core\generic.py", line 1330, in __nonzero__
f"The truth value of a {type(self).__name__} is ambiguous. "
我不確定日期不明確是怎么回事。 我嘗試將.values
添加到row['start_date']
和row['end_date']
和 'row['check_date']` 中,但沒有幫助。
有誰知道如何做到這一點或我的問題是什么?
在將列轉換為日期between
后使用
df = df.apply(pd.to_datetime)
df[df['check_date'].between(df['start_date'], df['end_date'])].index # -> Int64Index([6, 10, 11, 18], dtype='int64')
check_date start_date end_date
6 2012-11-29 09:41:00 2011-06-14 02:19:00 2017-09-17 02:21:00
10 2018-09-26 14:02:00 2012-12-31 20:51:00 2018-12-12 11:50:00
11 2014-11-13 02:09:00 2013-08-26 13:03:00 2016-08-23 06:31:00
18 2010-08-18 22:35:00 2010-01-06 15:06:00 2018-07-17 12:05:00
# convert to datetime
df.loc[:, ['check_date', 'start_date', 'end_date']] = df[['check_date', 'start_date', 'end_date']].apply(pd.to_datetime)
# filter for flag
flag = df[df['flag'] == 1].copy()
# list comprehension to check if each date is between each rows star and end
[flag[(date >= flag['start_date']) & (date <= flag['end_date'])].index.tolist() for date in flag['check_date']]
[[],
[],
[],
[0, 6, 18],
[6, 10, 12, 18],
[10, 12],
[0, 6, 10, 18, 24],
[],
[4, 18],
[18],
[0, 6, 10, 18],
[0, 6, 10, 12, 18, 24],
[6, 10, 12, 18]]
或將列表分配回框架
flag['Check'] = [flag[(date >= flag['start_date']) & (date <= flag['end_date'])].index.tolist() for date in flag['check_date']]
或 dict 理解而不是列表
{date: flag[(date >= flag['start_date']) & (date <= flag['end_date'])].index.tolist() for date in flag['check_date']}
{Timestamp('2019-10-31 22:00:00'): [],
Timestamp('2008-08-06 14:02:00'): [],
Timestamp('2008-04-19 19:38:00'): [],
Timestamp('2012-11-29 09:41:00'): [0, 6, 18],
Timestamp('2015-05-15 00:31:00'): [6, 10, 12, 18],
Timestamp('2018-09-26 14:02:00'): [10, 12],
Timestamp('2014-02-28 09:58:00'): [0, 6, 10, 18, 24],
Timestamp('2008-08-21 06:31:00'): [],
Timestamp('2010-12-16 22:39:00'): [4, 18],
Timestamp('2010-08-18 22:35:00'): [18],
Timestamp('2013-06-14 07:27:00'): [0, 6, 10, 18],
Timestamp('2014-07-03 01:02:00'): [0, 6, 10, 12, 18, 24],
Timestamp('2016-01-21 22:56:00'): [6, 10, 12, 18]}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.