簡體   English   中英

如何使用熊貓檢查日期列中的日期是否在不同列中的兩個日期之間?

[英]How do I check if a date in a date column is between two dates in different columns using pandas?

我想弄清楚如何檢查一列中的日期是否在兩個單獨列的日期之間。 我檢查了檢查日期列是否在日期范圍內 - 熊貓,但這不是完全相同的問題。

我使用每一行的唯一標識符來知道是否需要檢查該行的日期。

如果日期在兩個日期之間,我希望將該行的唯一標識符附加到列表中。

我在示例中使用“索引”作為唯一標識符。

import pandas as pd

date_dict = {'check_date': ['10/31/2019 10:00 PM',
  '11/10/2012 06:02 PM',
  '08/06/2008 02:02 PM',
  '05/13/2009 12:19 PM',
  '04/19/2008 07:38 PM',
  '10/08/2012 01:12 PM',
  '11/29/2012 09:41 AM',
  '08/03/2016 02:05 AM',
  '05/15/2015 12:31 AM',
  '04/05/2016 10:21 AM',
  '09/26/2018 02:02 PM',
  '11/13/2014 02:09 AM',
  '02/28/2014 09:58 AM',
  '10/02/2015 08:25 PM',
  '08/21/2008 06:31 AM',
  '05/31/2017 03:48 AM',
  '12/16/2010 10:39 PM',
  '12/05/2008 08:57 AM',
  '08/18/2010 10:35 PM',
  '07/06/2010 12:25 AM',
  '06/14/2013 07:27 AM',
  '09/27/2015 11:06 PM',
  '07/03/2014 01:02 AM',
  '09/18/2009 04:26 PM',
  '01/21/2016 10:56 PM'],
 'start_date': ['02/24/2012 12:57 PM',
  '09/25/2017 11:35 PM',
  '07/05/2015 10:58 PM',
  '04/26/2017 04:26 AM',
  '09/03/2010 10:50 AM',
  '07/08/2017 10:17 AM',
  '06/14/2011 02:19 AM',
  '03/21/2009 10:11 AM',
  '10/22/2012 12:39 AM',
  '11/09/2008 05:20 PM',
  '12/31/2012 08:51 PM',
  '08/26/2013 01:03 PM',
  '05/21/2014 01:48 AM',
  '11/11/2009 11:55 PM',
  '04/23/2012 10:14 AM',
  '11/23/2009 09:26 AM',
  '08/20/2010 02:13 PM',
  '08/09/2019 01:00 AM',
  '01/06/2010 03:06 PM',
  '02/23/2016 08:23 PM',
  '10/30/2019 03:20 AM',
  '06/12/2013 06:25 PM',
  '02/03/2019 05:46 PM',
  '08/07/2011 02:50 PM',
  '06/18/2013 03:59 AM'],
 'end_date': ['09/06/2014 03:03 AM',
  '08/24/2012 12:30 PM',
  '05/29/2008 05:48 AM',
  '12/31/2014 01:00 AM',
  '12/06/2011 05:47 PM',
  '04/28/2013 07:01 PM',
  '09/17/2017 02:21 AM',
  '06/23/2008 03:45 PM',
  '01/24/2011 03:04 PM',
  '08/05/2015 02:10 AM',
  '12/12/2018 11:50 AM',
  '08/23/2016 06:31 AM',
  '11/21/2018 08:49 AM',
  '12/05/2009 03:31 PM',
  '04/16/2010 09:24 PM',
  '09/08/2012 12:29 PM',
  '11/09/2009 08:08 AM',
  '11/13/2016 04:21 AM',
  '07/17/2018 12:05 PM',
  '05/03/2012 06:27 AM',
  '09/04/2012 09:11 PM',
  '06/26/2014 06:55 AM',
  '09/19/2016 08:48 PM',
  '05/02/2018 09:03 AM',
  '03/22/2015 04:20 AM']}
df = pd.DataFrame(date_dict)
df.reset_index(inplace = True)
df['flag'] = np.where(df['index'] % 2 == 0, 1, 0)

df_list = list(df[df['flag'] == 1]['index'])
analyst_list = []
for flag in df_list:
    min_date = df[df['index'] == flag]['check_date']
    for index, row in df.iterrows():
        start = row['start_date']
        end = row['end_date']
        if min_date > start and min_date <= end :
            analyst_list.append(row['index'])
        else:
            pass

當我運行上面的代碼時,我收到以下錯誤,我無法過去。

Traceback (most recent call last):

  File "<ipython-input-112-fecfeaa05d6d>", line 8, in <module>
    if min_date > start and min_date <= end :

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\stan_env\lib\site-packages\pandas\core\generic.py", line 1330, in __nonzero__
    f"The truth value of a {type(self).__name__} is ambiguous. "

我不確定日期不明確是怎么回事。 我嘗試將.values添加到row['start_date']row['end_date']和 'row['check_date']` 中,但沒有幫助。

有誰知道如何做到這一點或我的問題是什么?

在將列轉換為日期between后使用

df = df.apply(pd.to_datetime)
df[df['check_date'].between(df['start_date'], df['end_date'])].index # -> Int64Index([6, 10, 11, 18], dtype='int64')

            check_date          start_date            end_date
6  2012-11-29 09:41:00 2011-06-14 02:19:00 2017-09-17 02:21:00
10 2018-09-26 14:02:00 2012-12-31 20:51:00 2018-12-12 11:50:00
11 2014-11-13 02:09:00 2013-08-26 13:03:00 2016-08-23 06:31:00
18 2010-08-18 22:35:00 2010-01-06 15:06:00 2018-07-17 12:05:00

更新

# convert to datetime
df.loc[:, ['check_date', 'start_date', 'end_date']] = df[['check_date', 'start_date', 'end_date']].apply(pd.to_datetime)
# filter for flag
flag = df[df['flag'] == 1].copy()
# list comprehension to check if each date is between each rows star and end
[flag[(date >= flag['start_date']) & (date <= flag['end_date'])].index.tolist() for date in flag['check_date']]

[[],
 [],
 [],
 [0, 6, 18],
 [6, 10, 12, 18],
 [10, 12],
 [0, 6, 10, 18, 24],
 [],
 [4, 18],
 [18],
 [0, 6, 10, 18],
 [0, 6, 10, 12, 18, 24],
 [6, 10, 12, 18]]

或將列表分配回框架

flag['Check'] = [flag[(date >= flag['start_date']) & (date <= flag['end_date'])].index.tolist() for date in flag['check_date']]

或 dict 理解而不是列表

{date: flag[(date >= flag['start_date']) & (date <= flag['end_date'])].index.tolist() for date in flag['check_date']}

{Timestamp('2019-10-31 22:00:00'): [],
 Timestamp('2008-08-06 14:02:00'): [],
 Timestamp('2008-04-19 19:38:00'): [],
 Timestamp('2012-11-29 09:41:00'): [0, 6, 18],
 Timestamp('2015-05-15 00:31:00'): [6, 10, 12, 18],
 Timestamp('2018-09-26 14:02:00'): [10, 12],
 Timestamp('2014-02-28 09:58:00'): [0, 6, 10, 18, 24],
 Timestamp('2008-08-21 06:31:00'): [],
 Timestamp('2010-12-16 22:39:00'): [4, 18],
 Timestamp('2010-08-18 22:35:00'): [18],
 Timestamp('2013-06-14 07:27:00'): [0, 6, 10, 18],
 Timestamp('2014-07-03 01:02:00'): [0, 6, 10, 12, 18, 24],
 Timestamp('2016-01-21 22:56:00'): [6, 10, 12, 18]}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM