[英]Iterating through a range of dates in Python
I have the following code to do this, but how can I do it better?我有以下代码可以做到这一点,但我怎样才能做得更好? Right now I think it's better than nested loops, but it starts to get Perl-one-linerish when you have a generator in a list comprehension.
现在我认为它比嵌套循环更好,但是当你在列表理解中有一个生成器时,它开始变得 Perl-one-linerish。
day_count = (end_date - start_date).days + 1
for single_date in [d for d in (start_date + timedelta(n) for n in range(day_count)) if d <= end_date]:
print strftime("%Y-%m-%d", single_date.timetuple())
start_date
and end_date
variables are datetime.date
objects because I don't need the timestamps. start_date
和end_date
变量是datetime.date
对象,因为我不需要时间戳。 (They're going to be used to generate a report). For a start date of 2009-05-30
and an end date of 2009-06-09
:对于
2009-05-30
的开始日期和2009-06-09
的结束日期:
2009-05-30
2009-05-31
2009-06-01
2009-06-02
2009-06-03
2009-06-04
2009-06-05
2009-06-06
2009-06-07
2009-06-08
2009-06-09
Why are there two nested iterations?为什么有两个嵌套迭代? For me it produces the same list of data with only one iteration:
对我来说,它只用一次迭代生成相同的数据列表:
for single_date in (start_date + timedelta(n) for n in range(day_count)):
print ...
And no list gets stored, only one generator is iterated over.并且没有列表被存储,只有一个生成器被迭代。 Also the "if" in the generator seems to be unnecessary.
此外,生成器中的“if”似乎是不必要的。
After all, a linear sequence should only require one iterator, not two.毕竟,一个线性序列应该只需要一个迭代器,而不是两个。
Maybe the most elegant solution is using a generator function to completely hide/abstract the iteration over the range of dates:也许最优雅的解决方案是使用生成器函数来完全隐藏/抽象日期范围内的迭代:
from datetime import date, timedelta
def daterange(start_date, end_date):
for n in range(int((end_date - start_date).days)):
yield start_date + timedelta(n)
start_date = date(2013, 1, 1)
end_date = date(2015, 6, 2)
for single_date in daterange(start_date, end_date):
print(single_date.strftime("%Y-%m-%d"))
NB: For consistency with the built-in range()
function this iteration stops before reaching the end_date
.注意:为了与内置
range()
函数保持一致,此迭代在到达end_date
之前停止。 So for inclusive iteration use the next day, as you would with range()
.因此,对于包容性迭代,请在第二天使用,就像使用
range()
。
This might be more clear:这可能更清楚:
from datetime import date, timedelta
start_date = date(2019, 1, 1)
end_date = date(2020, 1, 1)
delta = timedelta(days=1)
while start_date <= end_date:
print(start_date.strftime("%Y-%m-%d"))
start_date += delta
Use the dateutil
library:使用
dateutil
库:
from datetime import date
from dateutil.rrule import rrule, DAILY
a = date(2009, 5, 30)
b = date(2009, 6, 9)
for dt in rrule(DAILY, dtstart=a, until=b):
print dt.strftime("%Y-%m-%d")
This python library has many more advanced features, some very useful, like relative delta
s—and is implemented as a single file (module) that's easily included into a project.这个 python 库有许多更高级的特性,其中一些非常有用,比如
relative delta
并且作为单个文件(模块)实现,可以很容易地包含到项目中。
Pandas is great for time series in general, and has direct support for date ranges. Pandas 非常适合一般的时间序列,并且直接支持日期范围。
import pandas as pd
daterange = pd.date_range(start_date, end_date)
You can then loop over the daterange to print the date:然后,您可以遍历日期范围以打印日期:
for single_date in daterange:
print (single_date.strftime("%Y-%m-%d"))
It also has lots of options to make life easier.它还有很多选择,让生活更轻松。 For example if you only wanted weekdays, you would just swap in bdate_range.
例如,如果您只想要工作日,则只需交换 bdate_range。 See http://pandas.pydata.org/pandas-docs/stable/timeseries.html#generating-ranges-of-timestamps
请参阅http://pandas.pydata.org/pandas-docs/stable/timeseries.html#generating-ranges-of-timestamps
The power of Pandas is really its dataframes, which support vectorized operations (much like numpy) that make operations across large quantities of data very fast and easy. Pandas 的强大之处在于它的数据帧,它支持向量化操作(很像 numpy),这使得跨大量数据的操作变得非常快速和容易。
EDIT: You could also completely skip the for loop and just print it directly, which is easier and more efficient:编辑:您也可以完全跳过 for 循环并直接打印它,这样更简单、更高效:
print(daterange)
import datetime
def daterange(start, stop, step=datetime.timedelta(days=1), inclusive=False):
# inclusive=False to behave like range by default
if step.days > 0:
while start < stop:
yield start
start = start + step
# not +=! don't modify object passed in if it's mutable
# since this function is not restricted to
# only types from datetime module
elif step.days < 0:
while start > stop:
yield start
start = start + step
if inclusive and start == stop:
yield start
# ...
for date in daterange(start_date, end_date, inclusive=True):
print strftime("%Y-%m-%d", date.timetuple())
This function does more than you strictly require, by supporting negative step, etc. As long as you factor out your range logic, then you don't need the separate day_count
and most importantly the code becomes easier to read as you call the function from multiple places.通过支持负步等,此函数所做的比您严格要求的要多。只要您考虑范围逻辑,那么您就不需要单独的
day_count
,最重要的是,当您调用该函数时,代码变得更容易阅读多个地方。
This is the most human-readable solution I can think of.这是我能想到的最易读的解决方案。
import datetime
def daterange(start, end, step=datetime.timedelta(1)):
curr = start
while curr < end:
yield curr
curr += step
Why not try:为什么不试试:
import datetime as dt
start_date = dt.datetime(2012, 12,1)
end_date = dt.datetime(2012, 12,5)
total_days = (end_date - start_date).days + 1 #inclusive 5 days
for day_number in range(total_days):
current_date = (start_date + dt.timedelta(days = day_number)).date()
print current_date
Numpy's arange
function can be applied to dates: Numpy 的
arange
函数可以应用于日期:
import numpy as np
from datetime import datetime, timedelta
d0 = datetime(2009, 1,1)
d1 = datetime(2010, 1,1)
dt = timedelta(days = 1)
dates = np.arange(d0, d1, dt).astype(datetime)
The use of astype
is to convert from numpy.datetime64
to an array of datetime.datetime
objects. astype
的用途是将numpy.datetime64
转换为datetime.datetime
对象的数组。
Show the last n days from today:显示从今天开始的最后 n 天:
import datetime
for i in range(0, 100):
print((datetime.date.today() + datetime.timedelta(i)).isoformat())
Output:输出:
2016-06-29
2016-06-30
2016-07-01
2016-07-02
2016-07-03
2016-07-04
For completeness, Pandas also has a period_range
function for timestamps that are out of bounds:为了完整
period_range
,Pandas 还有一个period_range
函数来处理越界的时间戳:
import pandas as pd
pd.period_range(start='1/1/1626', end='1/08/1627', freq='D')
import datetime
def daterange(start, stop, step_days=1):
current = start
step = datetime.timedelta(step_days)
if step_days > 0:
while current < stop:
yield current
current += step
elif step_days < 0:
while current > stop:
yield current
current += step
else:
raise ValueError("daterange() step_days argument must not be zero")
if __name__ == "__main__":
from pprint import pprint as pp
lo = datetime.date(2008, 12, 27)
hi = datetime.date(2009, 1, 5)
pp(list(daterange(lo, hi)))
pp(list(daterange(hi, lo, -1)))
pp(list(daterange(lo, hi, 7)))
pp(list(daterange(hi, lo, -7)))
assert not list(daterange(lo, hi, -1))
assert not list(daterange(hi, lo))
assert not list(daterange(lo, hi, -7))
assert not list(daterange(hi, lo, 7))
for i in range(16):
print datetime.date.today() + datetime.timedelta(days=i)
I have a similar problem, but I need to iterate monthly instead of daily.我有一个类似的问题,但我需要每月而不是每天迭代。
This is my solution这是我的解决方案
import calendar
from datetime import datetime, timedelta
def days_in_month(dt):
return calendar.monthrange(dt.year, dt.month)[1]
def monthly_range(dt_start, dt_end):
forward = dt_end >= dt_start
finish = False
dt = dt_start
while not finish:
yield dt.date()
if forward:
days = days_in_month(dt)
dt = dt + timedelta(days=days)
finish = dt > dt_end
else:
_tmp_dt = dt.replace(day=1) - timedelta(days=1)
dt = (_tmp_dt.replace(day=dt.day))
finish = dt < dt_end
Example #1示例#1
date_start = datetime(2016, 6, 1)
date_end = datetime(2017, 1, 1)
for p in monthly_range(date_start, date_end):
print(p)
Output输出
2016-06-01
2016-07-01
2016-08-01
2016-09-01
2016-10-01
2016-11-01
2016-12-01
2017-01-01
Example #2示例#2
date_start = datetime(2017, 1, 1)
date_end = datetime(2016, 6, 1)
for p in monthly_range(date_start, date_end):
print(p)
Output输出
2017-01-01
2016-12-01
2016-11-01
2016-10-01
2016-09-01
2016-08-01
2016-07-01
2016-06-01
You can generate a series of date between two dates using the pandas library simply and trustfully您可以简单而可靠地使用 pandas 库在两个日期之间生成一系列日期
import pandas as pd
print pd.date_range(start='1/1/2010', end='1/08/2018', freq='M')
You can change the frequency of generating dates by setting freq as D, M, Q, Y (daily, monthly, quarterly, yearly )您可以通过将 freq 设置为 D、M、Q、Y(每天、每月、每季度、每年)来更改生成日期的频率
> pip install DateTimeRange
from datetimerange import DateTimeRange
def dateRange(start, end, step):
rangeList = []
time_range = DateTimeRange(start, end)
for value in time_range.range(datetime.timedelta(days=step)):
rangeList.append(value.strftime('%m/%d/%Y'))
return rangeList
dateRange("2018-09-07", "2018-12-25", 7)
Out[92]:
['09/07/2018',
'09/14/2018',
'09/21/2018',
'09/28/2018',
'10/05/2018',
'10/12/2018',
'10/19/2018',
'10/26/2018',
'11/02/2018',
'11/09/2018',
'11/16/2018',
'11/23/2018',
'11/30/2018',
'12/07/2018',
'12/14/2018',
'12/21/2018']
Using pendulum.period:使用 pendulum.period:
import pendulum
start = pendulum.from_format('2020-05-01', 'YYYY-MM-DD', formatter='alternative')
end = pendulum.from_format('2020-05-02', 'YYYY-MM-DD', formatter='alternative')
period = pendulum.period(start, end)
for dt in period:
print(dt.to_date_string())
For those who are interested in Pythonic functional way:对于那些对 Pythonic 函数方式感兴趣的人:
from datetime import date, timedelta
from itertools import count, takewhile
for d in takewhile(lambda x: x<=date(2009,6,9), map(lambda x:date(2009,5,30)+timedelta(days=x), count())):
print(d)
This function has some extra features:这个函数有一些额外的特性:
error checking in case the end is older than the start错误检查,以防结尾早于开头
import datetime from datetime import timedelta DATE_FORMAT = '%Y/%m/%d' def daterange(start, end): def convert(date): try: date = datetime.datetime.strptime(date, DATE_FORMAT) return date.date() except TypeError: return date def get_date(n): return datetime.datetime.strftime(convert(start) + timedelta(days=n), DATE_FORMAT) days = (convert(end) - convert(start)).days if days <= 0: raise ValueError('The start date must be before the end date.') for n in range(0, days): yield get_date(n) start = '2014/12/1' end = '2014/12/31' print list(daterange(start, end)) start_ = datetime.date.today() end = '2015/12/1' print list(daterange(start, end))
Here's code for a general date range function, similar to Ber's answer, but more flexible:这是通用日期范围函数的代码,类似于 Ber 的答案,但更灵活:
def count_timedelta(delta, step, seconds_in_interval):
"""Helper function for iterate. Finds the number of intervals in the timedelta."""
return int(delta.total_seconds() / (seconds_in_interval * step))
def range_dt(start, end, step=1, interval='day'):
"""Iterate over datetimes or dates, similar to builtin range."""
intervals = functools.partial(count_timedelta, (end - start), step)
if interval == 'week':
for i in range(intervals(3600 * 24 * 7)):
yield start + datetime.timedelta(weeks=i) * step
elif interval == 'day':
for i in range(intervals(3600 * 24)):
yield start + datetime.timedelta(days=i) * step
elif interval == 'hour':
for i in range(intervals(3600)):
yield start + datetime.timedelta(hours=i) * step
elif interval == 'minute':
for i in range(intervals(60)):
yield start + datetime.timedelta(minutes=i) * step
elif interval == 'second':
for i in range(intervals(1)):
yield start + datetime.timedelta(seconds=i) * step
elif interval == 'millisecond':
for i in range(intervals(1 / 1000)):
yield start + datetime.timedelta(milliseconds=i) * step
elif interval == 'microsecond':
for i in range(intervals(1e-6)):
yield start + datetime.timedelta(microseconds=i) * step
else:
raise AttributeError("Interval must be 'week', 'day', 'hour' 'second', \
'microsecond' or 'millisecond'.")
I have the following code to do this, but how can I do it better?我有以下代码可以做到这一点,但是我该如何做得更好呢? Right now I think it's better than nested loops, but it starts to get Perl-one-linerish when you have a generator in a list comprehension.
现在,我认为它比嵌套循环更好,但是当列表理解器中包含生成器时,它开始变得Perl-linerish。
day_count = (end_date - start_date).days + 1
for single_date in [d for d in (start_date + timedelta(n) for n in range(day_count)) if d <= end_date]:
print strftime("%Y-%m-%d", single_date.timetuple())
start_date
and end_date
variables are datetime.date
objects because I don't need the timestamps. start_date
和end_date
变量是datetime.date
对象,因为我不需要时间戳。 (They're going to be used to generate a report). For a start date of 2009-05-30
and an end date of 2009-06-09
:对于开始日期
2009-05-30
和结束日期2009-06-09
:
2009-05-30
2009-05-31
2009-06-01
2009-06-02
2009-06-03
2009-06-04
2009-06-05
2009-06-06
2009-06-07
2009-06-08
2009-06-09
from datetime import date,timedelta
delta = timedelta(days=1)
start = date(2020,1,1)
end=date(2020,9,1)
loop_date = start
while loop_date<=end:
print(loop_date)
loop_date+=delta
You can use Arrow
:您可以使用
Arrow
:
This is example from the docs, iterating over hours:这是文档中的示例,迭代数小时:
from arrow import Arrow
>>> start = datetime(2013, 5, 5, 12, 30)
>>> end = datetime(2013, 5, 5, 17, 15)
>>> for r in Arrow.range('hour', start, end):
... print repr(r)
...
<Arrow [2013-05-05T12:30:00+00:00]>
<Arrow [2013-05-05T13:30:00+00:00]>
<Arrow [2013-05-05T14:30:00+00:00]>
<Arrow [2013-05-05T15:30:00+00:00]>
<Arrow [2013-05-05T16:30:00+00:00]>
To iterate over days, you can use like this:要迭代数天,您可以像这样使用:
>>> start = Arrow(2013, 5, 5)
>>> end = Arrow(2013, 5, 5)
>>> for r in Arrow.range('day', start, end):
... print repr(r)
(Didn't check if you can pass datetime.date
objects, but anyways Arrow
objects are easier in general) (没有检查你是否可以传递
datetime.date
对象,但无论如何, Arrow
对象通常更容易)
What about the following for doing a range incremented by days:对于按天递增的范围执行以下操作如何:
for d in map( lambda x: startDate+datetime.timedelta(days=x), xrange( (stopDate-startDate).days ) ):
# Do stuff here
For a generic version:对于通用版本:
for d in map( lambda x: startTime+x*stepTime, xrange( (stopTime-startTime).total_seconds() / stepTime.total_seconds() ) ):
# Do stuff here
Note that .total_seconds() is only supported after python 2.7 If you are stuck with an earlier version you can write your own function:请注意, .total_seconds() 仅在 python 2.7 之后才受支持如果您坚持使用早期版本,您可以编写自己的函数:
def total_seconds( td ):
return float(td.microseconds + (td.seconds + td.days * 24 * 3600) * 10**6) / 10**6
Slightly different approach to reversible steps by storing range
args in a tuple.通过将
range
参数存储在元组中来实现可逆步骤的方法略有不同。
def date_range(start, stop, step=1, inclusive=False):
day_count = (stop - start).days
if inclusive:
day_count += 1
if step > 0:
range_args = (0, day_count, step)
elif step < 0:
range_args = (day_count - 1, -1, step)
else:
raise ValueError("date_range(): step arg must be non-zero")
for i in range(*range_args):
yield start + timedelta(days=i)
import datetime
from dateutil.rrule import DAILY,rrule
date=datetime.datetime(2019,1,10)
date1=datetime.datetime(2019,2,2)
for i in rrule(DAILY , dtstart=date,until=date1):
print(i.strftime('%Y%b%d'),sep='\n')
OUTPUT:输出:
2019Jan10
2019Jan11
2019Jan12
2019Jan13
2019Jan14
2019Jan15
2019Jan16
2019Jan17
2019Jan18
2019Jan19
2019Jan20
2019Jan21
2019Jan22
2019Jan23
2019Jan24
2019Jan25
2019Jan26
2019Jan27
2019Jan28
2019Jan29
2019Jan30
2019Jan31
2019Feb01
2019Feb02
If you are going to use dynamic timedelta
then you can use:如果您要使用动态
timedelta
,那么您可以使用:
1. With while loop 1.带while循环
def datetime_range(start: datetime, end: datetime, delta: timedelta) -> Generator[datetime, None, None]:
while start <= end:
yield start
start += delta
2. With for loop 2.带for循环
from datetime import datetime, timedelta
from typing import Generator
def datetime_range(start: datetime, end: datetime, delta: timedelta) -> Generator[datetime, None, None]:
delta_units = int((end - start) / delta)
for _ in range(delta_units + 1):
yield start
start += delta
3. If you are using async/await 3. 如果你使用的是 async/await
async def datetime_range(start: datetime, end: datetime, delta: timedelta) -> AsyncGenerator[datetime, None]:
delta_units = int((end - start) / delta)
for _ in range(delta_units + 1):
yield start
start += delta
4. List comprehension 4.列表理解
def datetime_range(start: datetime, end: datetime, delta: timedelta) -> List[datetime]:
delta_units = int((end - start) / delta)
return [start + (delta * index) for index in range(delta_units + 1)]
Then 1 and 2 solutions simply can be used like this然后可以像这样简单地使用1和2解决方案
start = datetime(2020, 10, 10, 10, 00)
end = datetime(2022, 10, 10, 18, 00)
delta = timedelta(minutes=30)
result = [time_part for time_part in datetime_range(start, end, delta)]
# or
for time_part in datetime_range(start, end, delta):
print(time_part)
3-third solution can be used like this in async context.可以在异步上下文中像这样使用三分之三的解决方案。 Because it retruns an async generator object, which can be used only in async context
因为它重新运行异步生成器 object,它只能在异步上下文中使用
start = datetime(2020, 10, 10, 10, 00)
end = datetime(2022, 10, 10, 18, 00)
delta = timedelta(minutes=30)
result = [time_part async for time_part in datetime_range(start, end, delta)]
async for time_part in datetime_range(start, end, delta):
print(time_part)
The benefit of the solutions about is that all of them are using dynamic timedelta
.解决方案的好处是它们都使用动态
timedelta
。 This can be very usefull in cases when you do not know which time delta you will have.这在您不知道您将拥有哪个时间增量的情况下非常有用。
The standard pandas.date_range function serves this exact purpose and can be used as an one-liner....标准pandas.date_range function 正是为此目的,可用作单线......
Just use:只需使用:
pd.date_range(start=start_date.floor('d'),end=end_date.floor('d'), freq = 'd')
Note that I am using here floor and ceil (and there is also round ) so that I also round/floor/ceil the given timestamps to exact days.请注意,我在这里使用floor和ceil (并且还有round ),因此我还将给定的时间戳取整到精确的天数。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.