简体   繁体   English

从 numpy datetime64 获取年、月或日

[英]Get year, month or day from numpy datetime64

I have an array of datetime64 type:我有一个 datetime64 类型的数组:

dates = np.datetime64(['2010-10-17', '2011-05-13', "2012-01-15"])

Is there a better way than looping through each element just to get np.array of years:有没有比遍历每个元素以获得 np.array 年更好的方法:

years = f(dates)
#output:
array([2010, 2011, 2012], dtype=int8) #or dtype = string

I'm using stable numpy version 1.6.2.我正在使用稳定的 numpy 版本 1.6.2。

I find the following tricks give between 2x and 4x speed increase versus the pandas method described in this answer (ie pd.DatetimeIndex(dates).year etc.).我发现以下技巧与此答案中描述的 pandas 方法(即pd.DatetimeIndex(dates).year等)相比,速度提高了 2 倍和 4 倍。 The speed of [dt.year for dt in dates.astype(object)] I find to be similar to the pandas method.我发现[dt.year for dt in dates.astype(object)]的速度与 pandas 方法相似。 Also these tricks can be applied directly to ndarrays of any shape (2D, 3D etc.)这些技巧也可以直接应用于任何形状(2D、3D 等)的 ndarray

dates = np.arange(np.datetime64('2000-01-01'), np.datetime64('2010-01-01'))
years = dates.astype('datetime64[Y]').astype(int) + 1970
months = dates.astype('datetime64[M]').astype(int) % 12 + 1
days = dates - dates.astype('datetime64[M]') + 1

As datetime is not stable in numpy I would use pandas for this:由于 numpy 中的 datetime 不稳定,我会为此使用 pandas:

In [52]: import pandas as pd

In [53]: dates = pd.DatetimeIndex(['2010-10-17', '2011-05-13', "2012-01-15"])

In [54]: dates.year
Out[54]: array([2010, 2011, 2012], dtype=int32)

Pandas uses numpy datetime internally, but seems to avoid the shortages, that numpy has up to now. Pandas 在内部使用 numpy 日期时间,但似乎避免了 numpy 到目前为止的短缺。

There should be an easier way to do this, but, depending on what you're trying to do, the best route might be to convert to a regular Python datetime object :应该有一种更简单的方法来执行此操作,但是,根据您要执行的操作,最好的方法可能是转换为常规的Python 日期时间对象

datetime64Obj = np.datetime64('2002-07-04T02:55:41-0700')
print datetime64Obj.astype(object).year
# 2002
print datetime64Obj.astype(object).day
# 4

Based on comments below, this seems to only work in Python 2.7.x and Python 3.6+根据下面的评论,这似乎只适用于 Python 2.7.x 和 Python 3.6+

This is how I do it.我就是这样做的。

import numpy as np

def dt2cal(dt):
    """
    Convert array of datetime64 to a calendar array of year, month, day, hour,
    minute, seconds, microsecond with these quantites indexed on the last axis.

    Parameters
    ----------
    dt : datetime64 array (...)
        numpy.ndarray of datetimes of arbitrary shape

    Returns
    -------
    cal : uint32 array (..., 7)
        calendar array with last axis representing year, month, day, hour,
        minute, second, microsecond
    """

    # allocate output 
    out = np.empty(dt.shape + (7,), dtype="u4")
    # decompose calendar floors
    Y, M, D, h, m, s = [dt.astype(f"M8[{x}]") for x in "YMDhms"]
    out[..., 0] = Y + 1970 # Gregorian Year
    out[..., 1] = (M - Y) + 1 # month
    out[..., 2] = (D - M) + 1 # dat
    out[..., 3] = (dt - D).astype("m8[h]") # hour
    out[..., 4] = (dt - h).astype("m8[m]") # minute
    out[..., 5] = (dt - m).astype("m8[s]") # second
    out[..., 6] = (dt - s).astype("m8[us]") # microsecond
    return out

It's vectorized across arbitrary input dimensions, it's fast, its intuitive, it works on numpy v1.15.4, it doesn't use pandas.它在任意输入维度上矢量化,速度快,直观,适用于 numpy v1.15.4,不使用 pandas。

I really wish numpy supported this functionality, it's required all the time in application development.真的希望 numpy 支持这个功能,在应用程序开发中一直需要它。 I always get super nervous when I have to roll my own stuff like this, I always feel like I'm missing an edge case.当我不得不像这样滚动自己的东西时,我总是非常紧张,我总是觉得我错过了一个边缘案例。

Using numpy version 1.10.4 and pandas version 0.17.1,使用 numpy 版本 1.10.4 和 pandas 版本 0.17.1,

dates = np.array(['2010-10-17', '2011-05-13', '2012-01-15'], dtype=np.datetime64)
pd.to_datetime(dates).year

I get what you're looking for:我得到你要找的东西:

array([2010, 2011, 2012], dtype=int32)

Use dates.tolist() to convert to native datetime objects, then simply access year .使用dates.tolist()转换为本机日期时间对象,然后只需访问year Example:例子:

>>> dates = np.array(['2010-10-17', '2011-05-13', '2012-01-15'], dtype='datetime64')
>>> [x.year for x in dates.tolist()]
[2010, 2011, 2012]

This is basically the same idea exposed in https://stackoverflow.com/a/35281829/2192272 , but using simpler syntax.这基本上与https://stackoverflow.com/a/35281829/2192272中公开的想法相同,但使用更简单的语法。

Tested with python 3.6 / numpy 1.18.使用 python 3.6 / numpy 1.18 测试。

Another possibility is:另一种可能是:

np.datetime64(dates,'Y') - returns - numpy.datetime64('2010')

or或者

np.datetime64(dates,'Y').astype(int)+1970 - returns - 2010

but works only on scalar values, won't take array但仅适用于标量值,不会采用数组

如果您升级到 numpy 1.7(其中 datetime 仍被标记为实验性),则以下内容应该可以工作。

dates/np.timedelta64(1,'Y')

Anon's answer works great for me, but I just need to modify the statement for days Anon 的回答对我很有用,但我只需要修改声明days

from:从:

days = dates - dates.astype('datetime64[M]') + 1

to:至:

days = dates.astype('datetime64[D]') - dates.astype('datetime64[M]') + 1

There's no direct way to do it yet, unfortunately, but there are a couple indirect ways:不幸的是,目前还没有直接的方法可以做到这一点,但是有几种间接的方法:

[dt.year for dt in dates.astype(object)]

or或者

[datetime.datetime.strptime(repr(d), "%Y-%m-%d %H:%M:%S").year for d in dates]

both inspired by the examples here .两者都受到此处示例的启发。

Both of these work for me on Numpy 1.6.1.这两种方法都适用于 Numpy 1.6.1。 You may need to be a bit more careful with the second one, since the repr() for the datetime64 might have a fraction part after a decimal point.您可能需要更加小心第二个,因为 datetime64 的 repr() 可能在小数点后有小数部分。

convert np.datetime64 to float-yearnp.datetime64转换为浮点年

In this solution, you can see, step-by-step, how to process np.datetime64 datatypes.在此解决方案中,您可以逐步了解如何处理np.datetime64数据类型。

In the following dt64 is of type np.datetime64 (or even a numpy.ndarray of that type):在下面的 dt64 是np.datetime64类型(甚至是该类型的 numpy.ndarray ):

  • year = dt64.astype('M8[Y]') contains just the year. year = dt64.astype('M8[Y]')只包含年份。 If you want a float array of that, do 1970 + year.astype(float) .如果您想要一个浮点数组,请执行1970 + year.astype(float)
  • the days (since January 1st) you can access by days = (dt64 - year).astype('timedelta64[D]')您可以通过days = (dt64 - year).astype('timedelta64[D]')
  • You can also deduce if a year is a leap year or not (compare days_of_year )您还可以推断一年是否是闰年(比较days_of_year

See also the numpy tutorial 另请参阅 numpy 教程

import numpy as np
import pandas as pd

def dt64_to_float(dt64):
    """Converts numpy.datetime64 to year as float.

    Rounded to days

    Parameters
    ----------
    dt64 : np.datetime64 or np.ndarray(dtype='datetime64[X]')
        date data

    Returns
    -------
    float or np.ndarray(dtype=float)
        Year in floating point representation
    """

    year = dt64.astype('M8[Y]')
    # print('year:', year)
    days = (dt64 - year).astype('timedelta64[D]')
    # print('days:', days)
    year_next = year + np.timedelta64(1, 'Y')
    # print('year_next:', year_next)
    days_of_year = (year_next.astype('M8[D]') - year.astype('M8[D]')
                    ).astype('timedelta64[D]')
    # print('days_of_year:', days_of_year)
    dt_float = 1970 + year.astype(float) + days / (days_of_year)
    # print('dt_float:', dt_float)
    return dt_float

if __name__ == "__main__":

    dt_str = '2011-11-11'
    dt64 = np.datetime64(dt_str)
    print(dt_str, 'as float:', dt64_to_float(dt64))
    print()

    dates = np.array([
        '1970-01-01', '2014-01-01', '2020-12-31', '2019-12-31', '2010-04-28'],
        dtype='datetime64[D]')
    float_dates = dt64_to_float(dates)


    print('dates:      ', dates)
    print('float_dates:', float_dates)

output输出

2011-11-11 as float: 2011.8602739726027

dates:       ['1970-01-01' '2014-01-01' '2020-12-31' '2019-12-31' '2010-04-28']
float_dates: [1970.         2014.         2020.99726776 2019.99726027 2010.32054795]

This is obviously quite late, but I benefitted from one of the answers, so sharing my bit here.这显然已经很晚了,但我从其中一个答案中受益,所以在这里分享我的一点。

The answer by Anon 🤔 is quite right- the speed is incredibly higher using numpy method instead of first casting them as pandas datetime series and then getting dates. Anon 🤔 的答案是非常正确的——使用 numpy 方法而不是首先将它们转换为 pandas 日期时间序列然后获取日期,速度非常快。 Albeit the offsetting and conversion of results after numpy transformations are bit shabby, a cleaner helper for this can be written, like so:-尽管 numpy 转换后结果的偏移和转换有点破旧,但可以编写一个更清洁的助手,如下所示:-

def from_numpy_datetime_extract(date: np.datetime64, extract_attribute: str = None):
    _YEAR_OFFSET = 1970
    _MONTH_OFFSET = 1
    _MONTH_FACTOR = 12
    _DAY_FACTOR = 24*60*60*1e9
    _DAY_OFFSET = 1

    if extract_attribute == 'year':
        return date.astype('datetime64[Y]').astype(int) + _YEAR_OFFSET
    elif extract_attribute == 'month':
        return date.astype('datetime64[M]').astype(int)%_MONTH_FACTOR + _MONTH_OFFSET
    elif extract_attribute == 'day':
        return ((date - date.astype('datetime64[M]'))/_DAY_FACTOR).astype(int) + _DAY_OFFSET
    else:
        raise ValueError("extract_attribute should be either of 'year', 'month' or 'day'")

Solving the ask dates = np.array(['2010-10-17', '2011-05-13', "2012-01-15"], dtype = 'datetime64') :-解决询问dates = np.array(['2010-10-17', '2011-05-13', "2012-01-15"], dtype = 'datetime64') :-

  • Numpy method (using the helper above) Numpy 方法(使用上面的助手)
%timeit -r10 -n1000 [from_numpy_datetime_extract(x, "year") for x in dates]
# 14.3 µs ± 4.03 µs per loop (mean ± std. dev. of 10 runs, 1000 loops each)
  • Pandas method熊猫方法
%timeit -r10 -n1000 pd.to_datetime(dates).year.tolist()
# 304 µs ± 32.2 µs per loop (mean ± std. dev. of 10 runs, 1000 loops each)

How about simply converting to string?简单地转换为字符串怎么样?

Probably the easiest way:可能是最简单的方法:

import numpy as np

date = np.datetime64("2000-01-01")
date_strings = date.astype(str).split('-'). 
# >> ['2000', '01', '01']

year_int = int(date_strings[0])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM