Pandas：比较日期时间 arrays 上的日期时间列

Question

我正在学习 Pandas，特别是现在使用 Datetimes。 我正在寻找通过日期时间列到达 select 行的方法。 如果 Datetime 列值在数组spacex和clonx值之间的范围内。

两个arrays：

clonx = array(['2019-08-14T23:32:00.000000000', '2019-08-14T23:35:00.000000000',
       '2019-08-14T23:35:00.000000000', ...,
       '2020-05-24T14:55:00.000000000', '2020-05-24T15:03:00.000000000',
       '2020-05-25T12:09:00.000000000'], dtype='datetime64[ns]')

spacex = array(['2019-08-14T23:27:00.000000000', '2019-08-14T23:30:00.000000000',
   '2019-08-14T23:30:00.000000000', ...,
   '2020-05-24T14:50:00.000000000', '2020-05-24T14:58:00.000000000',
   '2020-05-25T12:04:00.000000000'], dtype='datetime64[ns]')

专栏：

    first['datim']

0      2019-08-14 23:26:00
1      2019-08-14 23:26:00

2      2019-08-14 23:27:00
3      2019-08-14 23:30:00
4      2019-08-14 23:30:00
               ...        
5101   2020-05-25 20:48:00
5102   2020-05-25 20:49:00
5103   2020-05-26 13:52:00
5104   2020-05-26 13:52:00
5105   2020-05-26 14:22:00
Name: datim, Length: 3172, dtype: datetime64[ns]

如何从列first['datim']中获取位于spacex和clonx日期时间之间的日期时间值？

像这样的东西：

start_date = spacex[i]
end_date = clonx[i]
for i in range:
    final = (first['datim'] >= start_date) & (first['datim'] <= end_date)
result final

或者也许使用beween_time但无法找到使其与 arrays 一起使用的方法。

感谢您的时间！

Answer 1

虽然不是更好的解决方案：

datelist = []
for i in range(len(first.datim)):
    for j in range(len(clonx)):
        if (spacex[j]<=first.datim[i]) and (first.datim[i]<=clonx[j]):
            datelist.append(first.datim[i])
print(set(datelist))

{Timestamp('2019-08-14 23:30:00'), Timestamp('2019-08-14 23:27:00')}

Answer 2

您可以使用apply将列添加到您的 DataFrame，基于“ datim ”日期时间与日期时间的两个 arrays 相比。 这不能很好地处理大量数据，但对您来说可能没问题。 例如，这会告诉您时间是否在any日期时间对之间（如@Pygirl 的回答）：

def between_any(time):
    for s,c in zip(spacex, clonx):
        if (time  >= s) and (time <= c):
            return True
    return False

df['Between Any'] = df['datim'].apply(between_any)

或者您可以获取该值之间的日期对的索引：

def between_index(time):
    output = []
    for i in range(len(spacex)):
        if (time  >= spacex[i]) and (time <= clonx[i]):
            output.append(i)
    return output if output else np.nan

df['Between Indices'] = df['datim'].apply(between_index)

或者您实际上可以获取该值之间的时间戳：

def between_values(time):
    output = []
    for s,c in zip(spacex, clonx):
        if (time  >= s) and (time <= c):
            output.append((s,c))
    return output if output else np.nan

df['Between Values'] = df['datim'].apply(between_values)

根据您的数据，这是这样的：

In[0]: df

Out[0]:
                   datim
0    2019-08-14 23:26:00
1    2019-08-14 23:26:00
2    2019-08-14 23:27:00
3    2019-08-14 23:30:00
4    2019-08-14 23:30:00
5101 2020-05-25 20:48:00
5102 2020-05-25 20:49:00
5103 2020-05-26 13:52:00
5104 2020-05-26 13:52:00
5105 2020-05-26 14:22:00

In[1]:

clonx = pd.Series(['2019-08-14T23:32:00.000000000', '2019-08-14T23:35:00.000000000','2019-08-14T23:35:00.000000000','2020-05-24T14:55:00.000000000', '2020-05-24T15:03:00.000000000','2020-05-25T12:09:00.000000000'])

spacex = pd.Series(['2019-08-14T23:27:00.000000000', '2019-08-14T23:30:00.000000000','2019-08-14T23:30:00.000000000','2020-05-24T14:50:00.000000000', '2020-05-24T14:58:00.000000000','2020-05-25T12:04:00.000000000'])

clonx = pd.to_datetime(clonx)
spacex = pd.to_datetime(spacex)

df['Between Any'] = df['datim'].apply(between_any)
df['Between Indices'] = df['datim'].apply(between_index)
df['Between Values'] = df['datim'].apply(between_values)

df

Out[1]:

                   datim  Between Any Between Indices  \
0    2019-08-14 23:26:00        False             NaN   
1    2019-08-14 23:26:00        False             NaN   
2    2019-08-14 23:27:00         True             [0]   
3    2019-08-14 23:30:00         True       [0, 1, 2]   
4    2019-08-14 23:30:00         True       [0, 1, 2]   
5101 2020-05-25 20:48:00        False             NaN   
5102 2020-05-25 20:49:00        False             NaN   
5103 2020-05-26 13:52:00        False             NaN   
5104 2020-05-26 13:52:00        False             NaN   
5105 2020-05-26 14:22:00        False             NaN   

                                         Between Values  
0                                                   NaN  
1                                                   NaN  
2          [(2019-08-14 23:27:00, 2019-08-14 23:32:00)]  
3     [(2019-08-14 23:27:00, 2019-08-14 23:32:00), (...  
4     [(2019-08-14 23:27:00, 2019-08-14 23:32:00), (...  
5101                                                NaN  
5102                                                NaN  
5103                                                NaN  
5104                                                NaN  
5105                                                NaN

Pandas：比较日期时间 arrays 上的日期时间列

问题描述

2 个解决方案

解决方案1
0 2020-06-12 15:09:34

解决方案2
0 已采纳 2020-06-12 16:03:01

Pandas：比较日期时间 arrays 上的日期时间列

问题描述

2 个解决方案

解决方案1 0 2020-06-12 15:09:34

解决方案2 0 已采纳 2020-06-12 16:03:01

解决方案1
0 2020-06-12 15:09:34

解决方案2
0 已采纳 2020-06-12 16:03:01