[英]Pandas: Compare Datetime column on Datetime arrays
我正在学习 Pandas,特别是现在使用 Datetimes。 我正在寻找通过日期时间列到达 select 行的方法。 如果 Datetime 列值在数组spacex
和clonx
值之间的范围内。
两个arrays:
clonx = array(['2019-08-14T23:32:00.000000000', '2019-08-14T23:35:00.000000000',
'2019-08-14T23:35:00.000000000', ...,
'2020-05-24T14:55:00.000000000', '2020-05-24T15:03:00.000000000',
'2020-05-25T12:09:00.000000000'], dtype='datetime64[ns]')
spacex = array(['2019-08-14T23:27:00.000000000', '2019-08-14T23:30:00.000000000',
'2019-08-14T23:30:00.000000000', ...,
'2020-05-24T14:50:00.000000000', '2020-05-24T14:58:00.000000000',
'2020-05-25T12:04:00.000000000'], dtype='datetime64[ns]')
专栏:
first['datim']
0 2019-08-14 23:26:00
1 2019-08-14 23:26:00
2 2019-08-14 23:27:00
3 2019-08-14 23:30:00
4 2019-08-14 23:30:00
...
5101 2020-05-25 20:48:00
5102 2020-05-25 20:49:00
5103 2020-05-26 13:52:00
5104 2020-05-26 13:52:00
5105 2020-05-26 14:22:00
Name: datim, Length: 3172, dtype: datetime64[ns]
如何从列first['datim']
中获取位于spacex
和clonx
日期时间之间的日期时间值?
像这样的东西:
start_date = spacex[i]
end_date = clonx[i]
for i in range:
final = (first['datim'] >= start_date) & (first['datim'] <= end_date)
result final
或者也许使用beween_time但无法找到使其与 arrays 一起使用的方法。
感谢您的时间!
虽然不是更好的解决方案:
datelist = []
for i in range(len(first.datim)):
for j in range(len(clonx)):
if (spacex[j]<=first.datim[i]) and (first.datim[i]<=clonx[j]):
datelist.append(first.datim[i])
print(set(datelist))
{Timestamp('2019-08-14 23:30:00'), Timestamp('2019-08-14 23:27:00')}
您可以使用apply
将列添加到您的 DataFrame,基于“ datim
”日期时间与日期时间的两个 arrays 相比。 这不能很好地处理大量数据,但对您来说可能没问题。 例如,这会告诉您时间是否在any
日期时间对之间(如@Pygirl 的回答):
def between_any(time):
for s,c in zip(spacex, clonx):
if (time >= s) and (time <= c):
return True
return False
df['Between Any'] = df['datim'].apply(between_any)
或者您可以获取该值之间的日期对的索引:
def between_index(time):
output = []
for i in range(len(spacex)):
if (time >= spacex[i]) and (time <= clonx[i]):
output.append(i)
return output if output else np.nan
df['Between Indices'] = df['datim'].apply(between_index)
或者您实际上可以获取该值之间的时间戳:
def between_values(time):
output = []
for s,c in zip(spacex, clonx):
if (time >= s) and (time <= c):
output.append((s,c))
return output if output else np.nan
df['Between Values'] = df['datim'].apply(between_values)
根据您的数据,这是这样的:
In[0]: df
Out[0]:
datim
0 2019-08-14 23:26:00
1 2019-08-14 23:26:00
2 2019-08-14 23:27:00
3 2019-08-14 23:30:00
4 2019-08-14 23:30:00
5101 2020-05-25 20:48:00
5102 2020-05-25 20:49:00
5103 2020-05-26 13:52:00
5104 2020-05-26 13:52:00
5105 2020-05-26 14:22:00
In[1]:
clonx = pd.Series(['2019-08-14T23:32:00.000000000', '2019-08-14T23:35:00.000000000','2019-08-14T23:35:00.000000000','2020-05-24T14:55:00.000000000', '2020-05-24T15:03:00.000000000','2020-05-25T12:09:00.000000000'])
spacex = pd.Series(['2019-08-14T23:27:00.000000000', '2019-08-14T23:30:00.000000000','2019-08-14T23:30:00.000000000','2020-05-24T14:50:00.000000000', '2020-05-24T14:58:00.000000000','2020-05-25T12:04:00.000000000'])
clonx = pd.to_datetime(clonx)
spacex = pd.to_datetime(spacex)
df['Between Any'] = df['datim'].apply(between_any)
df['Between Indices'] = df['datim'].apply(between_index)
df['Between Values'] = df['datim'].apply(between_values)
df
Out[1]:
datim Between Any Between Indices \
0 2019-08-14 23:26:00 False NaN
1 2019-08-14 23:26:00 False NaN
2 2019-08-14 23:27:00 True [0]
3 2019-08-14 23:30:00 True [0, 1, 2]
4 2019-08-14 23:30:00 True [0, 1, 2]
5101 2020-05-25 20:48:00 False NaN
5102 2020-05-25 20:49:00 False NaN
5103 2020-05-26 13:52:00 False NaN
5104 2020-05-26 13:52:00 False NaN
5105 2020-05-26 14:22:00 False NaN
Between Values
0 NaN
1 NaN
2 [(2019-08-14 23:27:00, 2019-08-14 23:32:00)]
3 [(2019-08-14 23:27:00, 2019-08-14 23:32:00), (...
4 [(2019-08-14 23:27:00, 2019-08-14 23:32:00), (...
5101 NaN
5102 NaN
5103 NaN
5104 NaN
5105 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.