简体   繁体   English

遍历 Python 中的一系列日期

[英]Iterating through a range of dates in Python

I have the following code to do this, but how can I do it better?我有以下代码可以做到这一点,但我怎样才能做得更好? Right now I think it's better than nested loops, but it starts to get Perl-one-linerish when you have a generator in a list comprehension.现在我认为它比嵌套循环更好,但是当你在列表理解中有一个生成器时,它开始变得 Perl-one-linerish。

day_count = (end_date - start_date).days + 1
for single_date in [d for d in (start_date + timedelta(n) for n in range(day_count)) if d <= end_date]:
    print strftime("%Y-%m-%d", single_date.timetuple())

Notes笔记

  • I'm not actually using this to print.我实际上并没有使用它来打印。 That's just for demo purposes.这仅用于演示目的。
  • The start_date and end_date variables are datetime.date objects because I don't need the timestamps. start_dateend_date变量是datetime.date对象,因为我不需要时间戳。 (They're going to be used to generate a report). (它们将用于生成报告)。

Sample Output样品 Output

For a start date of 2009-05-30 and an end date of 2009-06-09 :对于2009-05-30的开始日期和2009-06-09的结束日期:

2009-05-30
2009-05-31
2009-06-01
2009-06-02
2009-06-03
2009-06-04
2009-06-05
2009-06-06
2009-06-07
2009-06-08
2009-06-09

Why are there two nested iterations?为什么有两个嵌套迭代? For me it produces the same list of data with only one iteration:对我来说,它只用一次迭代生成相同的数据列表:

for single_date in (start_date + timedelta(n) for n in range(day_count)):
    print ...

And no list gets stored, only one generator is iterated over.并且没有列表被存储,只有一个生成器被迭代。 Also the "if" in the generator seems to be unnecessary.此外,生成器中的“if”似乎是不必要的。

After all, a linear sequence should only require one iterator, not two.毕竟,一个线性序列应该只需要一个迭代器,而不是两个。

Update after discussion with John Machin:与约翰·马钦讨论后的更新:

Maybe the most elegant solution is using a generator function to completely hide/abstract the iteration over the range of dates:也许最优雅的解决方案是使用生成器函数来完全隐藏/抽象日期范围内的迭代:

from datetime import date, timedelta

def daterange(start_date, end_date):
    for n in range(int((end_date - start_date).days)):
        yield start_date + timedelta(n)

start_date = date(2013, 1, 1)
end_date = date(2015, 6, 2)
for single_date in daterange(start_date, end_date):
    print(single_date.strftime("%Y-%m-%d"))

NB: For consistency with the built-in range() function this iteration stops before reaching the end_date .注意:为了与内置range()函数保持一致,此迭代到达end_date之前停止。 So for inclusive iteration use the next day, as you would with range() .因此,对于包容性迭代,请在第二天使用,就像使用range()

This might be more clear:这可能更清楚:

from datetime import date, timedelta

start_date = date(2019, 1, 1)
end_date = date(2020, 1, 1)
delta = timedelta(days=1)
while start_date <= end_date:
    print(start_date.strftime("%Y-%m-%d"))
    start_date += delta

Use the dateutil library:使用dateutil库:

from datetime import date
from dateutil.rrule import rrule, DAILY

a = date(2009, 5, 30)
b = date(2009, 6, 9)

for dt in rrule(DAILY, dtstart=a, until=b):
    print dt.strftime("%Y-%m-%d")

This python library has many more advanced features, some very useful, like relative delta s—and is implemented as a single file (module) that's easily included into a project.这个 python 库有许多更高级的特性,其中一些非常有用,比如relative delta并且作为单个文件(模块)实现,可以很容易地包含到项目中。

Pandas is great for time series in general, and has direct support for date ranges. Pandas 非常适合一般的时间序列,并且直接支持日期范围。

import pandas as pd
daterange = pd.date_range(start_date, end_date)

You can then loop over the daterange to print the date:然后,您可以遍历日期范围以打印日期:

for single_date in daterange:
    print (single_date.strftime("%Y-%m-%d"))

It also has lots of options to make life easier.它还有很多选择,让生活更轻松。 For example if you only wanted weekdays, you would just swap in bdate_range.例如,如果您只想要工作日,则只需交换 bdate_range。 See http://pandas.pydata.org/pandas-docs/stable/timeseries.html#generating-ranges-of-timestamps请参阅http://pandas.pydata.org/pandas-docs/stable/timeseries.html#generating-ranges-of-timestamps

The power of Pandas is really its dataframes, which support vectorized operations (much like numpy) that make operations across large quantities of data very fast and easy. Pandas 的强大之处在于它的数据帧,它支持向量化操作(很像 numpy),这使得跨大量数据的操作变得非常快速和容易。

EDIT: You could also completely skip the for loop and just print it directly, which is easier and more efficient:编辑:您也可以完全跳过 for 循环并直接打印它,这样更简单、更高效:

print(daterange)
import datetime

def daterange(start, stop, step=datetime.timedelta(days=1), inclusive=False):
  # inclusive=False to behave like range by default
  if step.days > 0:
    while start < stop:
      yield start
      start = start + step
      # not +=! don't modify object passed in if it's mutable
      # since this function is not restricted to
      # only types from datetime module
  elif step.days < 0:
    while start > stop:
      yield start
      start = start + step
  if inclusive and start == stop:
    yield start

# ...

for date in daterange(start_date, end_date, inclusive=True):
  print strftime("%Y-%m-%d", date.timetuple())

This function does more than you strictly require, by supporting negative step, etc. As long as you factor out your range logic, then you don't need the separate day_count and most importantly the code becomes easier to read as you call the function from multiple places.通过支持负步等,此函数所做的比您严格要求的要多。只要您考虑范围逻辑,那么您就不需要单独的day_count ,最重要的是,当您调用该函数时,代码变得更容易阅读多个地方。

This is the most human-readable solution I can think of.这是我能想到的最易读的解决方案。

import datetime

def daterange(start, end, step=datetime.timedelta(1)):
    curr = start
    while curr < end:
        yield curr
        curr += step

Why not try:为什么不试试:

import datetime as dt

start_date = dt.datetime(2012, 12,1)
end_date = dt.datetime(2012, 12,5)

total_days = (end_date - start_date).days + 1 #inclusive 5 days

for day_number in range(total_days):
    current_date = (start_date + dt.timedelta(days = day_number)).date()
    print current_date

Numpy's arange function can be applied to dates: Numpy 的arange函数可以应用于日期:

import numpy as np
from datetime import datetime, timedelta
d0 = datetime(2009, 1,1)
d1 = datetime(2010, 1,1)
dt = timedelta(days = 1)
dates = np.arange(d0, d1, dt).astype(datetime)

The use of astype is to convert from numpy.datetime64 to an array of datetime.datetime objects. astype的用途是将numpy.datetime64转换为datetime.datetime对象的数组。

Show the last n days from today:显示从今天开始的最后 n 天:

import datetime
for i in range(0, 100):
    print((datetime.date.today() + datetime.timedelta(i)).isoformat())

Output:输出:

2016-06-29
2016-06-30
2016-07-01
2016-07-02
2016-07-03
2016-07-04

For completeness, Pandas also has a period_range function for timestamps that are out of bounds:为了完整period_range ,Pandas 还有一个period_range函数来处理越界的时间戳:

import pandas as pd

pd.period_range(start='1/1/1626', end='1/08/1627', freq='D')
import datetime

def daterange(start, stop, step_days=1):
    current = start
    step = datetime.timedelta(step_days)
    if step_days > 0:
        while current < stop:
            yield current
            current += step
    elif step_days < 0:
        while current > stop:
            yield current
            current += step
    else:
        raise ValueError("daterange() step_days argument must not be zero")

if __name__ == "__main__":
    from pprint import pprint as pp
    lo = datetime.date(2008, 12, 27)
    hi = datetime.date(2009, 1, 5)
    pp(list(daterange(lo, hi)))
    pp(list(daterange(hi, lo, -1)))
    pp(list(daterange(lo, hi, 7)))
    pp(list(daterange(hi, lo, -7))) 
    assert not list(daterange(lo, hi, -1))
    assert not list(daterange(hi, lo))
    assert not list(daterange(lo, hi, -7))
    assert not list(daterange(hi, lo, 7)) 
for i in range(16):
    print datetime.date.today() + datetime.timedelta(days=i)

I have a similar problem, but I need to iterate monthly instead of daily.我有一个类似的问题,但我需要每月而不是每天迭代。

This is my solution这是我的解决方案

import calendar
from datetime import datetime, timedelta

def days_in_month(dt):
    return calendar.monthrange(dt.year, dt.month)[1]

def monthly_range(dt_start, dt_end):
    forward = dt_end >= dt_start
    finish = False
    dt = dt_start

    while not finish:
        yield dt.date()
        if forward:
            days = days_in_month(dt)
            dt = dt + timedelta(days=days)            
            finish = dt > dt_end
        else:
            _tmp_dt = dt.replace(day=1) - timedelta(days=1)
            dt = (_tmp_dt.replace(day=dt.day))
            finish = dt < dt_end

Example #1示例#1

date_start = datetime(2016, 6, 1)
date_end = datetime(2017, 1, 1)

for p in monthly_range(date_start, date_end):
    print(p)

Output输出

2016-06-01
2016-07-01
2016-08-01
2016-09-01
2016-10-01
2016-11-01
2016-12-01
2017-01-01

Example #2示例#2

date_start = datetime(2017, 1, 1)
date_end = datetime(2016, 6, 1)

for p in monthly_range(date_start, date_end):
    print(p)

Output输出

2017-01-01
2016-12-01
2016-11-01
2016-10-01
2016-09-01
2016-08-01
2016-07-01
2016-06-01

You can generate a series of date between two dates using the pandas library simply and trustfully您可以简单而可靠地使用 pandas 库在两个日期之间生成一系列日期

import pandas as pd

print pd.date_range(start='1/1/2010', end='1/08/2018', freq='M')

You can change the frequency of generating dates by setting freq as D, M, Q, Y (daily, monthly, quarterly, yearly )您可以通过将 freq 设置为 D、M、Q、Y(每天、每月、每季度、每年)来更改生成日期的频率

> pip install DateTimeRange

from datetimerange import DateTimeRange

def dateRange(start, end, step):
        rangeList = []
        time_range = DateTimeRange(start, end)
        for value in time_range.range(datetime.timedelta(days=step)):
            rangeList.append(value.strftime('%m/%d/%Y'))
        return rangeList

    dateRange("2018-09-07", "2018-12-25", 7)  

    Out[92]: 
    ['09/07/2018',
     '09/14/2018',
     '09/21/2018',
     '09/28/2018',
     '10/05/2018',
     '10/12/2018',
     '10/19/2018',
     '10/26/2018',
     '11/02/2018',
     '11/09/2018',
     '11/16/2018',
     '11/23/2018',
     '11/30/2018',
     '12/07/2018',
     '12/14/2018',
     '12/21/2018']

Using pendulum.period:使用 pendulum.period:

import pendulum

start = pendulum.from_format('2020-05-01', 'YYYY-MM-DD', formatter='alternative')
end = pendulum.from_format('2020-05-02', 'YYYY-MM-DD', formatter='alternative')

period = pendulum.period(start, end)

for dt in period:
    print(dt.to_date_string())

For those who are interested in Pythonic functional way:对于那些对 Pythonic 函数方式感兴趣的人:

from datetime import date, timedelta
from itertools import count, takewhile

for d in takewhile(lambda x: x<=date(2009,6,9), map(lambda x:date(2009,5,30)+timedelta(days=x), count())):
    print(d)

This function has some extra features:这个函数有一些额外的特性:

  • can pass a string matching the DATE_FORMAT for start or end and it is converted to a date object可以传递与 DATE_FORMAT 匹配的字符串作为开始或结束,并将其转换为日期对象
  • can pass a date object for start or end可以传递开始或结束的日期对象
  • error checking in case the end is older than the start错误检查,以防结尾早于开头

    import datetime from datetime import timedelta DATE_FORMAT = '%Y/%m/%d' def daterange(start, end): def convert(date): try: date = datetime.datetime.strptime(date, DATE_FORMAT) return date.date() except TypeError: return date def get_date(n): return datetime.datetime.strftime(convert(start) + timedelta(days=n), DATE_FORMAT) days = (convert(end) - convert(start)).days if days <= 0: raise ValueError('The start date must be before the end date.') for n in range(0, days): yield get_date(n) start = '2014/12/1' end = '2014/12/31' print list(daterange(start, end)) start_ = datetime.date.today() end = '2015/12/1' print list(daterange(start, end))

Here's code for a general date range function, similar to Ber's answer, but more flexible:这是通用日期范围函数的代码,类似于 Ber 的答案,但更灵活:

def count_timedelta(delta, step, seconds_in_interval):
    """Helper function for iterate.  Finds the number of intervals in the timedelta."""
    return int(delta.total_seconds() / (seconds_in_interval * step))


def range_dt(start, end, step=1, interval='day'):
    """Iterate over datetimes or dates, similar to builtin range."""
    intervals = functools.partial(count_timedelta, (end - start), step)

    if interval == 'week':
        for i in range(intervals(3600 * 24 * 7)):
            yield start + datetime.timedelta(weeks=i) * step

    elif interval == 'day':
        for i in range(intervals(3600 * 24)):
            yield start + datetime.timedelta(days=i) * step

    elif interval == 'hour':
        for i in range(intervals(3600)):
            yield start + datetime.timedelta(hours=i) * step

    elif interval == 'minute':
        for i in range(intervals(60)):
            yield start + datetime.timedelta(minutes=i) * step

    elif interval == 'second':
        for i in range(intervals(1)):
            yield start + datetime.timedelta(seconds=i) * step

    elif interval == 'millisecond':
        for i in range(intervals(1 / 1000)):
            yield start + datetime.timedelta(milliseconds=i) * step

    elif interval == 'microsecond':
        for i in range(intervals(1e-6)):
            yield start + datetime.timedelta(microseconds=i) * step

    else:
        raise AttributeError("Interval must be 'week', 'day', 'hour' 'second', \
            'microsecond' or 'millisecond'.")

I have the following code to do this, but how can I do it better?我有以下代码可以做到这一点,但是我该如何做得更好呢? Right now I think it's better than nested loops, but it starts to get Perl-one-linerish when you have a generator in a list comprehension.现在,我认为它比嵌套循环更好,但是当列表理解器中包含生成器时,它开始变得Perl-linerish。

day_count = (end_date - start_date).days + 1
for single_date in [d for d in (start_date + timedelta(n) for n in range(day_count)) if d <= end_date]:
    print strftime("%Y-%m-%d", single_date.timetuple())

Notes笔记

  • I'm not actually using this to print.我实际上并没有用它来打印。 That's just for demo purposes.这只是出于演示目的。
  • The start_date and end_date variables are datetime.date objects because I don't need the timestamps. start_dateend_date变量是datetime.date对象,因为我不需要时间戳。 (They're going to be used to generate a report). (它们将用于生成报告)。

Sample Output样本输出

For a start date of 2009-05-30 and an end date of 2009-06-09 :对于开始日期2009-05-30和结束日期2009-06-09

2009-05-30
2009-05-31
2009-06-01
2009-06-02
2009-06-03
2009-06-04
2009-06-05
2009-06-06
2009-06-07
2009-06-08
2009-06-09
from datetime import date,timedelta
delta = timedelta(days=1)
start = date(2020,1,1)
end=date(2020,9,1)
loop_date = start
while loop_date<=end:
    print(loop_date)
    loop_date+=delta

You can use Arrow :您可以使用Arrow

This is example from the docs, iterating over hours:这是文档中的示例,迭代数小时:

from arrow import Arrow

>>> start = datetime(2013, 5, 5, 12, 30)
>>> end = datetime(2013, 5, 5, 17, 15)
>>> for r in Arrow.range('hour', start, end):
...     print repr(r)
...
<Arrow [2013-05-05T12:30:00+00:00]>
<Arrow [2013-05-05T13:30:00+00:00]>
<Arrow [2013-05-05T14:30:00+00:00]>
<Arrow [2013-05-05T15:30:00+00:00]>
<Arrow [2013-05-05T16:30:00+00:00]>

To iterate over days, you can use like this:要迭代数天,您可以像这样使用:

>>> start = Arrow(2013, 5, 5)
>>> end = Arrow(2013, 5, 5)
>>> for r in Arrow.range('day', start, end):
...     print repr(r)

(Didn't check if you can pass datetime.date objects, but anyways Arrow objects are easier in general) (没有检查你是否可以传递datetime.date对象,但无论如何, Arrow对象通常更容易)

What about the following for doing a range incremented by days:对于按天递增的范围执行以下操作如何:

for d in map( lambda x: startDate+datetime.timedelta(days=x), xrange( (stopDate-startDate).days ) ):
  # Do stuff here
  • startDate and stopDate are datetime.date objects startDate 和 stopDate 是 datetime.date 对象

For a generic version:对于通用版本:

for d in map( lambda x: startTime+x*stepTime, xrange( (stopTime-startTime).total_seconds() / stepTime.total_seconds() ) ):
  # Do stuff here
  • startTime and stopTime are datetime.date or datetime.datetime object (both should be the same type) startTime 和 stopTime 是 datetime.date 或 datetime.datetime 对象(两者应该是相同的类型)
  • stepTime is a timedelta object stepTime 是一个 timedelta 对象

Note that .total_seconds() is only supported after python 2.7 If you are stuck with an earlier version you can write your own function:请注意, .total_seconds() 仅在 python 2.7 之后才受支持如果您坚持使用早期版本,您可以编写自己的函数:

def total_seconds( td ):
  return float(td.microseconds + (td.seconds + td.days * 24 * 3600) * 10**6) / 10**6

Slightly different approach to reversible steps by storing range args in a tuple.通过将range参数存储在元组中来实现可逆步骤的方法略有不同。

def date_range(start, stop, step=1, inclusive=False):
    day_count = (stop - start).days
    if inclusive:
        day_count += 1

    if step > 0:
        range_args = (0, day_count, step)
    elif step < 0:
        range_args = (day_count - 1, -1, step)
    else:
        raise ValueError("date_range(): step arg must be non-zero")

    for i in range(*range_args):
        yield start + timedelta(days=i)
import datetime
from dateutil.rrule import DAILY,rrule

date=datetime.datetime(2019,1,10)

date1=datetime.datetime(2019,2,2)

for i in rrule(DAILY , dtstart=date,until=date1):
     print(i.strftime('%Y%b%d'),sep='\n')

OUTPUT:输出:

2019Jan10
2019Jan11
2019Jan12
2019Jan13
2019Jan14
2019Jan15
2019Jan16
2019Jan17
2019Jan18
2019Jan19
2019Jan20
2019Jan21
2019Jan22
2019Jan23
2019Jan24
2019Jan25
2019Jan26
2019Jan27
2019Jan28
2019Jan29
2019Jan30
2019Jan31
2019Feb01
2019Feb02

If you are going to use dynamic timedelta then you can use:如果您要使用动态timedelta ,那么您可以使用:

1. With while loop 1.带while循环

def datetime_range(start: datetime, end: datetime, delta: timedelta) -> Generator[datetime, None, None]:
    while start <= end:
        yield start
        start += delta

2. With for loop 2.带for循环

from datetime import datetime, timedelta
from typing import Generator


def datetime_range(start: datetime, end: datetime, delta: timedelta) -> Generator[datetime, None, None]:
    delta_units = int((end - start) / delta)

    for _ in range(delta_units + 1):
        yield start
        start += delta

3. If you are using async/await 3. 如果你使用的是 async/await

async def datetime_range(start: datetime, end: datetime, delta: timedelta) -> AsyncGenerator[datetime, None]:
    delta_units = int((end - start) / delta)

    for _ in range(delta_units + 1):
        yield start
        start += delta

4. List comprehension 4.列表理解

def datetime_range(start: datetime, end: datetime, delta: timedelta) -> List[datetime]:
    delta_units = int((end - start) / delta)
    return [start + (delta * index) for index in range(delta_units + 1)]

Then 1 and 2 solutions simply can be used like this然后可以像这样简单地使用1和2解决方案

start = datetime(2020, 10, 10, 10, 00)
end = datetime(2022, 10, 10, 18, 00)
delta = timedelta(minutes=30)

result = [time_part for time_part in datetime_range(start, end, delta)]
# or 
for time_part in datetime_range(start, end, delta):
    print(time_part)

3-third solution can be used like this in async context.可以在异步上下文中像这样使用三分之三的解决方案。 Because it retruns an async generator object, which can be used only in async context因为它重新运行异步生成器 object,它只能在异步上下文中使用

start = datetime(2020, 10, 10, 10, 00)
end = datetime(2022, 10, 10, 18, 00)
delta = timedelta(minutes=30)

result = [time_part async for time_part in datetime_range(start, end, delta)]

async for time_part in datetime_range(start, end, delta):
    print(time_part)

The benefit of the solutions about is that all of them are using dynamic timedelta .解决方案的好处是它们都使用动态timedelta This can be very usefull in cases when you do not know which time delta you will have.这在您不知道您将拥有哪个时间增量的情况下非常有用。

The standard pandas.date_range function serves this exact purpose and can be used as an one-liner....标准pandas.date_range function 正是为此目的,可用作单线......

Just use:只需使用:

pd.date_range(start=start_date.floor('d'),end=end_date.floor('d'), freq = 'd')

Note that I am using here floor and ceil (and there is also round ) so that I also round/floor/ceil the given timestamps to exact days.请注意,我在这里使用floorceil (并且还有round ),因此我还将给定的时间戳取整到精确的天数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM