[英]How do you model something-over-time in Python?
我正在寻找一种数据类型来帮助我模拟流动时间内的资源可用性。
我从多个方向解决了这个问题,但总是回到不知道数据类型来建模像整数这样简单的东西的基本问题。
我可以将我的约会转换为时间序列事件(例如,约会到达意味着 -1 可用性,约会离开意味着 +1)但我仍然不知道如何操作该数据,以便我可以提取可用性大于零的时间段.
有人以缺乏重点为由进行了近距离投票,但我在这里的目标似乎很单一,因此我将尝试以图形方式解释问题。 我试图推断活动作业数量低于给定容量的时间段。
将一系列已知的并行容量(例如 9-6 之间的 3 个)和具有可变开始/结束的作业列表转换为可用时间的时间范围列表。
我的方法是构建时间序列,但包括可用性对象,该对象的值设置为该期间的可用性。
availability:
[
{
"start": 09:00,
"end": 12:00,
"value": 4
},
{
"start": 12:00,
"end": 13:00,
"value": 3
}
]
data: [
{
"start": 10:00,
"end": 10:30,
}
]
在开始/结束时间建立时间序列索引,以值作为值。 可用性的开始时间是+值,结束时间是-值。 对于事件,如您所说,它是 -1 或 +1。
"09:00" 4
"10:00" -1
"10:30" 1
"12:00" -4
"12:00" 3
"13:00" -3
然后按索引、总和和累计总和分组。
得到:
"09:00" 4
"10:00" 3
"10:30" 4
"12:00" 3
"13:00" 0
熊猫中的示例代码:
import numpy as np
import pandas as pd
data = [
{
"start": "10:00",
"end": "10:30",
}
]
breakpoints = [
{
"start": "00:00",
"end": "09:00",
"value": 0
},
{
"start": "09:00",
"end": "12:00",
"value": 4
},
{
"start": "12:00",
"end": "12:30",
"value": 4
},
{
"start": "12:30",
"end": "13:00",
"value": 3
},
{
"start": "13:00",
"end": "00:00",
"value": 0
}
]
df = pd.DataFrame(data, columns=['start', 'end'])
print(df.head(5))
starts = pd.DataFrame(data, columns=['start'])
starts["value"] = -1
starts = starts.set_index("start")
ends = pd.DataFrame(data, columns=['end'])
ends["value"] = 1
ends = ends.set_index("end")
breakpointsStarts = pd.DataFrame(breakpoints, columns=['start', 'value']).set_index("start")
breakpointsEnds = pd.DataFrame(breakpoints, columns=['end', 'value'])
breakpointsEnds["value"] = breakpointsEnds["value"].transform(lambda x: -x)
breakpointsEnds = breakpointsEnds.set_index("end")
countsDf = pd.concat([starts, ends, breakpointsEnds, breakpointsStarts]).sort_index()
countsDf = countsDf.groupby(countsDf.index).sum().cumsum()
print(countsDf)
# Periods that are available
df = countsDf
df["available"] = df["value"] > 0
# Indexes where the value of available changes
# Alternatively swap out available for the value.
time_changes = df["available"].diff()[df["available"].diff() != 0].index.values
newDf = pd.DataFrame(time_changes, columns= ["start"])
# Setting the end column to the value of the next start
newDf['end'] = newDf.transform(np.roll, shift=-1)
print(newDf)
# Join this back in to get the actual value of available
mergedDf = newDf.merge(df, left_on="start", right_index=True)
print(mergedDf)
最后返回:
start end value available
0 00:00 09:00 0 False
1 09:00 13:00 4 True
2 13:00 00:00 0 False
我会像对待约会一样对待它。 将空闲时间建模为单独的约会。 对于每个结束约会,检查是否还有另一个正在进行中,如果是,请跳过此处。 如果不是,请查找下一个开始约会(开始日期大于此结束日期的约会。)
在您迭代完所有约会之后,您应该有一个倒置的面具。
对我来说,这个问题可以用布尔值列表很好地表示。 为了便于解释,我们假设每个潜在作业的长度是 15 分钟的倍数。 因此,从 9 点到 6 点,我们有 135 个要跟踪可用性的“时间段”。 我们用布尔变量表示一个时隙中队列的可用性:如果队列正在处理作业,则为False
,如果队列可用则为True
。
首先,我们为每个队列以及输出创建一个时隙列表。 因此,每个队列和输出都有时隙 t k ,1 <= k <= 135。
然后,给定五个作业队列 q j , 1 <= j <= 5,如果至少存在一个 q j ,其中索引 k 处的时隙列表为True
,我们说 t k在时间 k 是“开放的”。
我们可以在独立的 Python 中实现它,如下所示:
slots = [ True ] * 135
queues = [ slots ] * 5
output = [ False ] * 135
def available (k):
for q in queues:
if q[k]:
return True
return False
然后我们可以假设存在一些函数dispatch (length)
将作业分配给可用队列,将queue[q]
的适当插槽设置为False
。
最后,要更新输出,我们只需调用:
def update():
for k in range(0, 135):
output[k] = available[k]
或者,为了提高效率:
def update(i, j):
for k in range(i, j):
output[k] = available[k]
然后,您可以简单地调用update(i, j)
每当dispatch()
更新时间槽i
到j
以用于新作业。 这样,调度和更新是一个 O(n) 操作,其中n
是有多少个时隙被改变,而不管有多少个时隙。
这将允许您创建一个简单的函数,将人类可读的时间映射到时隙值的范围,这将允许根据需要使时隙变大或变小。
您还可以轻松地扩展此想法以使用每列是一个队列的Series.any()
数据框,从而允许您一次在每一行上使用Series.any()
来快速更新输出列。
很想听听关于这种方法的建议! 也许我错过了这个问题的复杂性,但我认为这是一个很好的解决方案。
您可以使用(datetime, increment)
元组来跟踪可用性的变化。 作业开始事件的increment = 1
,作业结束事件的increment = -1
。 然后itertools.accumulate
允许计算随着时间的推移作业开始和结束时的累积可用性。 这是一个示例实现:
from datetime import time
import itertools as it
def compute_availability(jobs, opening_hours, capacity):
jobs = [((x, -1), (y, +1)) for x, y in jobs]
opens, closes = opening_hours
events = [[opens, capacity]] + sorted(t for job in jobs for t in job) + [(closes, 0)]
availability = list(it.accumulate(events,
lambda x, y: [y[0], x[1] + y[1]]))
for x, y in zip(availability, availability[1:]):
# If multiple events happen at the same time, only yield the last one.
if y[0] > x[0]:
yield x
这增加了人工(opens, capacity)
和(closes, 0)
事件来初始化计算。 上面的示例考虑了一天,但通过创建分别共享第一个和最后一个作业的datetime
opens
和closes
datetime
对象,很容易将其扩展到多天。
这是 OP 示例计划的输出:
from pprint import pprint
jobs = [(time(10), time(15)),
(time(9), time(11)),
(time(12, 30), time(16)),
(time(10), time(18))]
availability = list(compute_availability(
jobs, opening_hours=(time(9), time(18)), capacity=3
))
pprint(availability)
打印:
[[datetime.time(9, 0), 2],
[datetime.time(10, 0), 0],
[datetime.time(11, 0), 1],
[datetime.time(12, 30), 0],
[datetime.time(15, 0), 1],
[datetime.time(16, 0), 2]]
第一个元素表示可用性何时发生变化,第二个元素表示该变化导致的可用性。 例如,上午 9 点提交一个作业,导致可用性从 3 下降到 2,然后在上午 10 点提交另外两个作业,而第一个作业仍在运行(因此可用性下降到 0)。
现在我们已经计算了初始可用性,一个重要的方面是在添加新作业时更新它。 这里最好不要从完整的作业列表中重新计算可用性,因为如果正在跟踪许多作业,这可能会很昂贵。 因为availability
已经排序,我们可以使用bisect
模块来确定 O(log(N)) 中的相关更新范围。 然后需要执行许多步骤。 假设作业被安排为[x, y]
,其中x
, y
是两个日期时间对象。
[x, y]
区间内的可用性是否大于零(包括x
左侧的事件(即前一个事件))。[x, y]
中所有事件的可用性降低 1。x
不在事件列表中,我们需要添加它,否则我们需要检查是否可以将x
事件与剩下的事件合并。y
不在事件列表中,我们需要添加它。这是相关的代码:
import bisect
def add_job(availability, job, *, weight=1):
"""weight: how many lanes the job requires"""
job = list(job)
start = bisect.bisect(availability, job[:1])
# Emulate a `bisect_right` which doens't work directly since
# we're comparing lists of different length.
if start < len(availability):
start += (job[0] == availability[start][0])
stop = bisect.bisect(availability, job[1:])
if any(slot[1] < weight for slot in availability[start-1:stop]):
raise ValueError('The requested time slot is not available')
for slot in availability[start:stop]:
slot[1] -= weight
if job[0] > availability[start-1][0]:
previous_availability = availability[start-1][1]
availability.insert(start, [job[0], previous_availability - weight])
stop += 1
else:
availability[start-1][1] -= weight
if start >= 2 and availability[start-1][1] == availability[start-2][1]:
del availability[start-1]
stop -= 1
if stop == len(availability) or job[1] < availability[stop][0]:
previous_availability = availability[stop-1][1]
availability.insert(stop, [job[1], previous_availability + weight])
我们可以通过向 OP 的示例计划添加一些作业来测试它:
for job in [[time(15), time(17)],
[time(11, 30), time(12)],
[time(13), time(14)]]: # this one should raise since availability is zero
print(f'\nAdding {job = }')
add_job(availability, job)
pprint(availability)
输出:
Adding job = [datetime.time(15, 0), datetime.time(17, 0)]
[[datetime.time(9, 0), 2],
[datetime.time(10, 0), 0],
[datetime.time(11, 0), 1],
[datetime.time(12, 30), 0],
[datetime.time(16, 0), 1],
[datetime.time(17, 0), 2]]
Adding job = [datetime.time(11, 30), datetime.time(12, 0)]
[[datetime.time(9, 0), 2],
[datetime.time(10, 0), 0],
[datetime.time(11, 0), 1],
[datetime.time(11, 30), 0],
[datetime.time(12, 0), 1],
[datetime.time(12, 30), 0],
[datetime.time(16, 0), 1],
[datetime.time(17, 0), 2]]
Adding job = [datetime.time(13, 0), datetime.time(14, 0)]
Traceback (most recent call last):
[...]
ValueError: The requested time slot is not available
我们还可以使用此接口在服务不可用的时间段(例如从下午 6 点到第二天上午 9 点)封锁所有车道。 只需在该时间段内提交一个weight=capacity
的工作:
add_job(availability,
[datetime(2020, 3, 14, 18), datetime(2020, 3, 15, 9)]
weight=3)
我们还可以使用add_job
从头开始构建完整的计划:
availability = availability = list(compute_availability(
[], opening_hours=(time(9), time(18)), capacity=3
))
print('Initial availability')
pprint(availability)
for job in jobs:
print(f'\nAdding {job = }')
add_job(availability, job)
pprint(availability)
输出:
Initial availability
[[datetime.time(9, 0), 3]]
Adding job = (datetime.time(10, 0), datetime.time(15, 0))
[[datetime.time(9, 0), 3],
[datetime.time(10, 0), 2],
[datetime.time(15, 0), 3]]
Adding job = (datetime.time(9, 0), datetime.time(11, 0))
[[datetime.time(9, 0), 2],
[datetime.time(10, 0), 1],
[datetime.time(11, 0), 2],
[datetime.time(15, 0), 3]]
Adding job = (datetime.time(12, 30), datetime.time(16, 0))
[[datetime.time(9, 0), 2],
[datetime.time(10, 0), 1],
[datetime.time(11, 0), 2],
[datetime.time(12, 30), 1],
[datetime.time(15, 0), 2],
[datetime.time(16, 0), 3]]
Adding job = (datetime.time(10, 0), datetime.time(18, 0))
[[datetime.time(9, 0), 2],
[datetime.time(10, 0), 0],
[datetime.time(11, 0), 1],
[datetime.time(12, 30), 0],
[datetime.time(15, 0), 1],
[datetime.time(16, 0), 2],
[datetime.time(18, 0), 3]]
除非您的时间分辨率小于一分钟,否则我建议使用一天中的分钟图,并在每个工作的时间跨度内分配一组 jobId
例如:
# convert time to minute of the day (assumes24H time, but you can make this your own way)
def toMinute(time):
return sum(p*t for p,t in zip(map(int,time.split(":")),(60,1)))
def toTime(minute):
return f"{minute//60}:{minute%60:02d}"
# booking a job adds it to all minutes covered by its duration
def book(timeMap,jobId,start,duration):
startMin = toMinute(start)
for m in range(startMin,startMin+duration):
timeMap[m].add(jobId)
# unbooking a job removes it from all minutes where it was present
def unbook(timeMap,jobId):
for s in timeMap:
s.discard(jobId)
# return time ranges for minutes meeting a given condition
def minuteSpans(timeMap,condition,start="09:00",end="18:00"):
start,end = toMinute(start),toMinute(end)
timeRange = timeMap[start:end]
match = [condition(s) for s in timeRange]
breaks = [True] + [a!=b for a,b in zip(match,match[1:])]
starts = [i for (i,a),b in zip(enumerate(match),breaks) if b]
return [(start+s,start+e) for s,e in zip(starts,starts[1:]+[len(match)]) if match[s]]
def timeSpans(timeMap,condition,start="09:00",end="18:00"):
return [(toTime(s),toTime(e)) for s,e in minuteSpans(timeMap,condition,start,end)]
# availability is ranges of minutes where the number of jobs is less than your capacity
def available(timeMap,start="09:00",end="18:00",maxJobs=5):
return timeSpans(timeMap,lambda s:len(s)<maxJobs,start,end)
示例用法:
timeMap = [set() for _ in range(1440)]
book(timeMap,"job1","9:45",25)
book(timeMap,"job2","9:30",45)
book(timeMap,"job3","9:00",90)
print(available(timeMap,maxJobs=3))
[('9:00', '9:45'), ('10:10', '18:00')]
print(timeSpans(timeMap,lambda s:"job3" in s))
[('9:00', '10:30')]
通过一些调整,您甚至可以拥有跳过某些时间段(例如午餐时间)的不连续工作。 您还可以通过在其中放置假工作来阻止某些时期。
如果您需要单独管理作业队列,您可以拥有单独的时间图(每个队列一个),并在需要全局图时将它们合二为一:
print(available(timeMap1,maxJobs=1))
print(available(timeMap2,maxJobs=1))
print(available(timeMap3,maxJobs=1))
globalMap = list(set.union(*qs) for qs in zip(timeMap1,timeMap2,timeMap3))
print(available(globalMap),maxJobs=3)
将所有这些放入 TimeMap 类(而不是单个函数)中,您应该有一个非常好的工具集可以使用。
您可以使用表示可以运行作业的通道的专用类。 这些对象可以跟踪作业及其可用性:
import bisect
from datetime import time
from functools import total_ordering
import math
@total_ordering
class TimeSlot:
def __init__(self, start, stop, lane):
self.start = start
self.stop = stop
self.lane = lane
def __contains__(self, other):
return self.start <= other.start and self.stop >= other.stop
def __lt__(self, other):
return (self.start, -self.stop.second) < (other.start, -other.stop.second)
def __eq__(self, other):
return (self.start, -self.stop.second) == (other.start, -other.stop.second)
def __str__(self):
return f'({self.lane}) {[self.start, self.stop]}'
__repr__ = __str__
class Lane:
@total_ordering
class TimeHorizon:
def __repr__(self):
return '...'
def __lt__(self, other):
return False
def __eq__(self, other):
return False
@property
def second(self):
return math.inf
@property
def timestamp(self):
return math.inf
time_horizon = TimeHorizon()
del TimeHorizon
def __init__(self, start, nr):
self.nr = nr
self.availability = [TimeSlot(start, self.time_horizon, self)]
def add_job(self, job):
if not isinstance(job, TimeSlot):
job = TimeSlot(*job, self)
# We want to bisect_right but only on the start time,
# so we need to do it manually if they are equal.
index = bisect.bisect_left(self.availability, job)
if index < len(self.availability):
index += (job.start == self.availability[index].start)
index -= 1 # select the corresponding free slot
slot = self.availability[index]
if slot.start > job.start or slot.stop is not self.time_horizon and job.stop > slot.stop:
raise ValueError('Requested time slot not available')
if job == slot:
del self.availability[index]
elif job.start == slot.start:
slot.start = job.stop
elif job.stop == slot.stop:
slot.stop = job.start
else:
slot_end = slot.stop
slot.stop = job.start
self.availability.insert(index+1, TimeSlot(job.stop, slot_end, self))
可以按如下方式使用Lane
对象:
lane = Lane(start=time(9), nr=1)
print(lane.availability)
lane.add_job([time(11), time(14)])
print(lane.availability)
输出:
[(1) [datetime.time(9, 0), ...]]
[(1) [datetime.time(9, 0), datetime.time(11, 0)],
(1) [datetime.time(14, 0), ...]]
添加作业后,可用性也会更新。
现在我们可以一起使用多个这些车道对象来表示一个完整的时间表。 可以根据需要添加作业,可用性将自动更新:
class Schedule:
def __init__(self, n_lanes, start):
self.lanes = [Lane(start, nr=i) for i in range(n_lanes)]
def add_job(self, job):
for lane in self.lanes:
try:
lane.add_job(job)
except ValueError:
pass
else:
break
from pprint import pprint
# Example jobs from OP.
jobs = [(time(10), time(15)),
(time(9), time(11)),
(time(12, 30), time(16)),
(time(10), time(18))]
schedule = Schedule(3, start=time(9))
for job in jobs:
schedule.add_job(job)
for lane in schedule.lanes:
pprint(lane.availability)
输出:
[(0) [datetime.time(9, 0), datetime.time(10, 0)],
(0) [datetime.time(15, 0), ...]]
[(1) [datetime.time(11, 0), datetime.time(12, 30)],
(1) [datetime.time(16, 0), ...]]
[(2) [datetime.time(9, 0), datetime.time(10, 0)],
(2) [datetime.time(18, 0), ...]]
我们可以创建一个专用的树状结构,跟踪所有通道的时隙,以便在注册新作业时选择最适合的时隙。 树中的节点代表单个时隙,其子节点是包含在该时隙内的所有时隙。 然后,在注册新工作时,我们可以搜索树以找到最佳位置。 树和车道共享相同的时隙,因此我们只需要在删除或插入新时隙时手动调整时隙。 这是相关代码,它有点冗长(只是一个快速草稿):
import itertools as it
class OneStepBuffered:
"""Can back up elements that are consumed by `it.takewhile`.
From: https://stackoverflow.com/a/30615837/3767239
"""
_sentinel = object()
def __init__(self, it):
self._it = iter(it)
self._last = self._sentinel
self._next = self._sentinel
def __iter__(self):
return self
def __next__(self):
sentinel = self._sentinel
if self._next is not sentinel:
next_val, self._next = self._next, sentinel
return next_val
try:
self._last = next(self._it)
return self._last
except StopIteration:
self._last = self._next = sentinel
raise
def step_back(self):
if self._last is self._sentinel:
raise ValueError("Can't back up a step")
self._next, self._last = self._last, self._sentinel
class SlotTree:
def __init__(self, slot, subslots, parent=None):
self.parent = parent
self.slot = slot
self.subslots = []
slots = OneStepBuffered(subslots)
for slot in slots:
subslots = it.takewhile(lambda x: x.stop <= slot.stop, slots)
self.subslots.append(SlotTree(slot, subslots, self))
try:
slots.step_back()
except ValueError:
break
def __str__(self):
sub_repr = ['\n| '.join(str(slot).split('\n'))
for slot in self.subslots]
sub_repr = [f'| {x}' for x in sub_repr]
sub_repr = '\n'.join(sub_repr)
sep = '\n' if sub_repr else ''
return f'{self.slot}{sep}{sub_repr}'
def find_minimal_containing_slot(self, slot):
try:
return min(self.find_containing_slots(slot),
key=lambda x: x.slot.stop.second - x.slot.start.second)
except ValueError:
raise ValueError('Requested time slot not available') from None
def find_containing_slots(self, slot):
for candidate in self.subslots:
if slot in candidate.slot:
yield from candidate.find_containing_slots(slot)
yield candidate
@classmethod
def from_slots(cls, slots):
# Ascending in start time, descending in stop time (secondary).
return cls(cls.__name__, sorted(slots))
class Schedule:
def __init__(self, n_lanes, start):
self.lanes = [Lane(start, i+1) for i in range(n_lanes)]
self.slots = SlotTree.from_slots(
s for lane in self.lanes for s in lane.availability)
def add_job(self, job):
if not isinstance(job, TimeSlot):
job = TimeSlot(*job, lane=None)
# Minimal containing slot is one possible strategy,
# others can be implemented as well.
slot = self.slots.find_minimal_containing_slot(job)
lane = slot.slot.lane
if job == slot.slot:
slot.parent.subslots.remove(slot)
elif job.start > slot.slot.start and job.stop < slot.slot.stop:
slot.parent.subslots.insert(
slot.parent.subslots.index(slot) + 1,
SlotTree(TimeSlot(job.stop, slot.slot.stop, lane), [], slot.parent))
lane.add_job(job)
现在我们可以使用Schedule
类自动将作业分配到通道并更新它们的可用性:
if __name__ == '__main__':
jobs = [(time(10), time(15)), # example from OP
(time(9), time(11)),
(time(12, 30), time(16)),
(time(10), time(18))]
schedule = Schedule(3, start=time(9))
print(schedule.slots, end='\n\n')
for job in jobs:
print(f'Adding {TimeSlot(*job, "new slot")}')
schedule.add_job(job)
print(schedule.slots, end='\n\n')
输出:
SlotTree
| (1) [datetime.time(9, 0), ...]
| (2) [datetime.time(9, 0), ...]
| (3) [datetime.time(9, 0), ...]
Adding (new slot) [datetime.time(10, 0), datetime.time(15, 0)]
SlotTree
| (1) [datetime.time(9, 0), datetime.time(10, 0)]
| (1) [datetime.time(15, 0), ...]
| (2) [datetime.time(9, 0), ...]
| (3) [datetime.time(9, 0), ...]
Adding (new slot) [datetime.time(9, 0), datetime.time(11, 0)]
SlotTree
| (1) [datetime.time(9, 0), datetime.time(10, 0)]
| (1) [datetime.time(15, 0), ...]
| (2) [datetime.time(11, 0), ...]
| (3) [datetime.time(9, 0), ...]
Adding (new slot) [datetime.time(12, 30), datetime.time(16, 0)]
SlotTree
| (1) [datetime.time(9, 0), datetime.time(10, 0)]
| (1) [datetime.time(15, 0), ...]
| (2) [datetime.time(11, 0), datetime.time(12, 30)]
| (2) [datetime.time(16, 0), ...]
| (3) [datetime.time(9, 0), ...]
Adding (new slot) [datetime.time(10, 0), datetime.time(18, 0)]
SlotTree
| (1) [datetime.time(9, 0), datetime.time(10, 0)]
| (1) [datetime.time(15, 0), ...]
| (2) [datetime.time(11, 0), datetime.time(12, 30)]
| (2) [datetime.time(16, 0), ...]
| (3) [datetime.time(9, 0), datetime.time(10, 0)]
| (3) [datetime.time(18, 0), ...]
数字(i)
表示通道编号, []
表示该通道上的可用时隙。 A ...
表示“开放式结束”(时间范围)。 正如我们所看到的,当调整时隙时,树不会自行重组; 这将是一个可能的改进。 理想情况下,对于每个新工作,相应的最佳时隙将从树中弹出,然后,根据工作如何适应时隙,将调整后的版本和可能的新时隙推回到树中(或者根本没有,如果该工作完全适合该插槽)。
上面的例子只考虑了一个日期和time
对象,但代码很容易扩展以使用datetime
对象。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.