[英]Comparing numpy array of datetimes with timestamp
I am not understanding why this code throws an error.我不明白为什么这段代码会引发错误。 Isn't '>' vectorized?
'>' 不是矢量化的吗? I can see there is no problem with
x_month_begin[0,0] > st_d
comparison.我可以看到
x_month_begin[0,0] > st_d
比较没有问题。 Would appreciate insights and fix suggestions.将不胜感激见解和修复建议。
import pandas as pd
import numpy as np
import datetime
end_d = pd.to_datetime('23/02/2018', format="%d/%m/%Y")
x_month_begin = pd.date_range(datetime.datetime(year=end_d.year-1, month=1, day=1),
datetime.datetime(year=end_d.year+1, month=12, day=1), freq='MS')
# stacking with each row for each year
x_month_begin = np.vstack(np.split(x_month_begin, 3))
# transposing for each column to be a year
x_month_begin = np.transpose(x_month_begin)
st_d = pd.to_datetime('01/2016', format="%m/%Y")
x_month_begin > st_d
If you check the type of objects, the problem becomes clear.如果您检查对象的类型,问题就会变得清晰。
x_month_begin
: is a 3 dimensional numpy array ( x_month_begin.shape
) x_month_begin
:是一个 3 维 numpy 数组( x_month_begin.shape
)st_d
: is a pandas timestamp variable st_d
: 是一个熊猫时间戳变量Both cannot be compared directly.两者不能直接比较。 For comparison, you can do something like this:
为了进行比较,您可以执行以下操作:
[y > st_d for x in x_month_begin for y in x ]
So you start off with a pandas structure:所以你从一个熊猫结构开始:
In [133]: x_month_begin
Out[133]:
DatetimeIndex(['2020-01-01', '2020-02-01', '2020-03-01', '2020-04-01',
'2020-05-01', '2020-06-01', '2020-07-01', '2020-08-01',
'2020-09-01', '2020-10-01', '2020-11-01', '2020-12-01'],
dtype='datetime64[ns]', freq='MS')
the same thing as a numpy array:与 numpy 数组相同的事情:
In [134]: x_month_begin.values
Out[134]:
array(['2020-01-01T00:00:00.000000000', '2020-02-01T00:00:00.000000000',
'2020-03-01T00:00:00.000000000', '2020-04-01T00:00:00.000000000',
'2020-05-01T00:00:00.000000000', '2020-06-01T00:00:00.000000000',
'2020-07-01T00:00:00.000000000', '2020-08-01T00:00:00.000000000',
'2020-09-01T00:00:00.000000000', '2020-10-01T00:00:00.000000000',
'2020-11-01T00:00:00.000000000', '2020-12-01T00:00:00.000000000'],
dtype='datetime64[ns]')
You manipulate that into a (n,3) array (I suspect this can be done more directly with a reshape and possible transpose):你将它操作成一个 (n,3) 数组(我怀疑这可以通过重塑和可能的转置更直接地完成):
In [135]: x_month_begin = np.vstack(np.split(x_month_begin, 3))
In [138]: x_month_begin = np.transpose(x_month_begin)
In [139]: x_month_begin
Out[139]:
array([['2020-01-01T00:00:00.000000000', '2020-05-01T00:00:00.000000000',
'2020-09-01T00:00:00.000000000'],
['2020-02-01T00:00:00.000000000', '2020-06-01T00:00:00.000000000',
'2020-10-01T00:00:00.000000000'],
['2020-03-01T00:00:00.000000000', '2020-07-01T00:00:00.000000000',
'2020-11-01T00:00:00.000000000'],
['2020-04-01T00:00:00.000000000', '2020-08-01T00:00:00.000000000',
'2020-12-01T00:00:00.000000000']], dtype='datetime64[ns]')
In [140]: _.shape
Out[140]: (4, 3)
Any ways, now your comparison:任何方式,现在你的比较:
In [141]: st_d = pd.to_datetime('01/2016', format="%m/%Y")
In [142]: st_d
Out[142]: Timestamp('2016-01-01 00:00:00')
In [143]: x_month_begin >st_d
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-143-30567662e59d> in <module>
----> 1 x_month_begin >st_d
pandas/_libs/tslibs/c_timestamp.pyx in pandas._libs.tslibs.c_timestamp._Timestamp.__richcmp__()
TypeError: '<' not supported between instances of 'Timestamp' and 'int'
numpy
arrays can do <
comparisons, but they have certain rules about what dtypes
are compatible. numpy
数组可以进行<
比较,但它们对兼容的dtypes
有一定的规则。 (eg comparing strings and numbers doesn't work). (例如,比较字符串和数字不起作用)。 In addition
pandas
plays its own games with dates and times, some formats are internal, some are compatible with the numpy datatime64
.此外,
pandas
玩自己的日期和时间游戏,一些格式是内部格式,一些与 numpy datatime64
兼容。
For example if we convert your timestamp to a numpy equivalent:例如,如果我们将您的时间戳转换为 numpy 等价物:
In [144]: st_d.to_numpy()
Out[144]: numpy.datetime64('2016-01-01T00:00:00.000000000')
the comparison works:比较有效:
In [145]: x_month_begin>st_d.to_numpy()
Out[145]:
array([[ True, True, True],
[ True, True, True],
[ True, True, True],
[ True, True, True]])
pandas
is built on numpy
, or at least uses numpy
arrays to store its data. pandas
建立在numpy
之上,或者至少使用numpy
数组来存储其数据。 But none of the numpy
code is pandas
aware.但是没有一个
numpy
代码是pandas
知道的。 If given a non-numpy object it will try, naively, to convert it, eg如果给定一个非 numpy 对象,它会天真地尝试转换它,例如
In [146]: np.asarray(st_d)
Out[146]: array(Timestamp('2016-01-01 00:00:00'), dtype=object)
is different from Out[144]
.与
Out[144]
。 [146] is the conversion that produces your error. [146] 是产生错误的转换。
The original `DatetimeIndex` can be tested against the timestamp. That's a 'pure' pandas operation.
In [152]: _133>st_d
Out[152]:
array([ True, True, True, True, True, True, True, True, True,
True, True, True])
_133.to_numpy().reshape(-1,4).T
gives the x_month_begin
array directly. _133.to_numpy().reshape(-1,4).T
x_month_begin
_133.to_numpy().reshape(-1,4).T
直接给出x_month_begin
数组。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.