简体   繁体   English

将日期时间的 numpy 数组与时间戳进行比较

[英]Comparing numpy array of datetimes with timestamp

I am not understanding why this code throws an error.我不明白为什么这段代码会引发错误。 Isn't '>' vectorized? '>' 不是矢量化的吗? I can see there is no problem with x_month_begin[0,0] > st_d comparison.我可以看到x_month_begin[0,0] > st_d比较没有问题。 Would appreciate insights and fix suggestions.将不胜感激见解和修复建议。

    import pandas as pd
    import numpy as np
    import datetime




    end_d = pd.to_datetime('23/02/2018', format="%d/%m/%Y")


    x_month_begin = pd.date_range(datetime.datetime(year=end_d.year-1, month=1, day=1), 
                                      datetime.datetime(year=end_d.year+1, month=12, day=1), freq='MS')

    # stacking with each row for each year        
    x_month_begin = np.vstack(np.split(x_month_begin, 3))
    # transposing for each column to be a year
    x_month_begin = np.transpose(x_month_begin)

    st_d = pd.to_datetime('01/2016', format="%m/%Y")

    x_month_begin > st_d

If you check the type of objects, the problem becomes clear.如果您检查对象的类型,问题就会变得清晰。

  1. x_month_begin : is a 3 dimensional numpy array ( x_month_begin.shape ) x_month_begin :是一个 3 维 numpy 数组( x_month_begin.shape
  2. st_d : is a pandas timestamp variable st_d : 是一个熊猫时间戳变量

Both cannot be compared directly.两者不能直接比较。 For comparison, you can do something like this:为了进行比较,您可以执行以下操作:

[y > st_d for x in x_month_begin for y in x ]

So you start off with a pandas structure:所以你从一个熊猫结构开始:

In [133]: x_month_begin                                                         
Out[133]: 
DatetimeIndex(['2020-01-01', '2020-02-01', '2020-03-01', '2020-04-01',
               '2020-05-01', '2020-06-01', '2020-07-01', '2020-08-01',
               '2020-09-01', '2020-10-01', '2020-11-01', '2020-12-01'],
              dtype='datetime64[ns]', freq='MS')

the same thing as a numpy array:与 numpy 数组相同的事情:

In [134]: x_month_begin.values                                                  
Out[134]: 
array(['2020-01-01T00:00:00.000000000', '2020-02-01T00:00:00.000000000',
       '2020-03-01T00:00:00.000000000', '2020-04-01T00:00:00.000000000',
       '2020-05-01T00:00:00.000000000', '2020-06-01T00:00:00.000000000',
       '2020-07-01T00:00:00.000000000', '2020-08-01T00:00:00.000000000',
       '2020-09-01T00:00:00.000000000', '2020-10-01T00:00:00.000000000',
       '2020-11-01T00:00:00.000000000', '2020-12-01T00:00:00.000000000'],
      dtype='datetime64[ns]')

You manipulate that into a (n,3) array (I suspect this can be done more directly with a reshape and possible transpose):你将它操作成一个 (n,3) 数组(我怀疑这可以通过重塑和可能的转置更直接地完成):

In [135]: x_month_begin = np.vstack(np.split(x_month_begin, 3))                 
In [138]: x_month_begin = np.transpose(x_month_begin)                           
In [139]: x_month_begin                                                         
Out[139]: 
array([['2020-01-01T00:00:00.000000000', '2020-05-01T00:00:00.000000000',
        '2020-09-01T00:00:00.000000000'],
       ['2020-02-01T00:00:00.000000000', '2020-06-01T00:00:00.000000000',
        '2020-10-01T00:00:00.000000000'],
       ['2020-03-01T00:00:00.000000000', '2020-07-01T00:00:00.000000000',
        '2020-11-01T00:00:00.000000000'],
       ['2020-04-01T00:00:00.000000000', '2020-08-01T00:00:00.000000000',
        '2020-12-01T00:00:00.000000000']], dtype='datetime64[ns]')
In [140]: _.shape                                                               
Out[140]: (4, 3)

Any ways, now your comparison:任何方式,现在你的比较:

In [141]: st_d = pd.to_datetime('01/2016', format="%m/%Y")                      
In [142]: st_d                                                                  
Out[142]: Timestamp('2016-01-01 00:00:00')
In [143]: x_month_begin >st_d                                                   
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-143-30567662e59d> in <module>
----> 1 x_month_begin >st_d

pandas/_libs/tslibs/c_timestamp.pyx in pandas._libs.tslibs.c_timestamp._Timestamp.__richcmp__()

TypeError: '<' not supported between instances of 'Timestamp' and 'int'

numpy arrays can do < comparisons, but they have certain rules about what dtypes are compatible. numpy数组可以进行<比较,但它们对兼容的dtypes有一定的规则。 (eg comparing strings and numbers doesn't work). (例如,比较字符串和数字不起作用)。 In addition pandas plays its own games with dates and times, some formats are internal, some are compatible with the numpy datatime64 .此外, pandas玩自己的日期和时间游戏,一些格式是内部格式,一些与 numpy datatime64兼容。

For example if we convert your timestamp to a numpy equivalent:例如,如果我们将您的时间戳转换为 numpy 等价物:

In [144]: st_d.to_numpy()                                                       
Out[144]: numpy.datetime64('2016-01-01T00:00:00.000000000')

the comparison works:比较有效:

In [145]: x_month_begin>st_d.to_numpy()                                         
Out[145]: 
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

pandas is built on numpy , or at least uses numpy arrays to store its data. pandas建立在numpy之上,或者至少使用numpy数组来存储其数据。 But none of the numpy code is pandas aware.但是没有一个numpy代码是pandas知道的。 If given a non-numpy object it will try, naively, to convert it, eg如果给定一个非 numpy 对象,它会天真地尝试转换它,例如

In [146]: np.asarray(st_d)                                                      
Out[146]: array(Timestamp('2016-01-01 00:00:00'), dtype=object)

is different from Out[144] .Out[144] [146] is the conversion that produces your error. [146] 是产生错误的转换。

The original `DatetimeIndex` can be tested against the timestamp.  That's a 'pure' pandas operation.

In [152]: _133>st_d                                                             
Out[152]: 
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True])

_133.to_numpy().reshape(-1,4).T gives the x_month_begin array directly. _133.to_numpy().reshape(-1,4).T x_month_begin _133.to_numpy().reshape(-1,4).T直接给出x_month_begin数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM