[英]How do you model something-over-time in Python?
我正在尋找一種數據類型來幫助我模擬流動時間內的資源可用性。
我從多個方向解決了這個問題,但總是回到不知道數據類型來建模像整數這樣簡單的東西的基本問題。
我可以將我的約會轉換為時間序列事件(例如,約會到達意味着 -1 可用性,約會離開意味着 +1)但我仍然不知道如何操作該數據,以便我可以提取可用性大於零的時間段.
有人以缺乏重點為由進行了近距離投票,但我在這里的目標似乎很單一,因此我將嘗試以圖形方式解釋問題。 我試圖推斷活動作業數量低於給定容量的時間段。
將一系列已知的並行容量(例如 9-6 之間的 3 個)和具有可變開始/結束的作業列表轉換為可用時間的時間范圍列表。
我的方法是構建時間序列,但包括可用性對象,該對象的值設置為該期間的可用性。
availability:
[
{
"start": 09:00,
"end": 12:00,
"value": 4
},
{
"start": 12:00,
"end": 13:00,
"value": 3
}
]
data: [
{
"start": 10:00,
"end": 10:30,
}
]
在開始/結束時間建立時間序列索引,以值作為值。 可用性的開始時間是+值,結束時間是-值。 對於事件,如您所說,它是 -1 或 +1。
"09:00" 4
"10:00" -1
"10:30" 1
"12:00" -4
"12:00" 3
"13:00" -3
然后按索引、總和和累計總和分組。
得到:
"09:00" 4
"10:00" 3
"10:30" 4
"12:00" 3
"13:00" 0
熊貓中的示例代碼:
import numpy as np
import pandas as pd
data = [
{
"start": "10:00",
"end": "10:30",
}
]
breakpoints = [
{
"start": "00:00",
"end": "09:00",
"value": 0
},
{
"start": "09:00",
"end": "12:00",
"value": 4
},
{
"start": "12:00",
"end": "12:30",
"value": 4
},
{
"start": "12:30",
"end": "13:00",
"value": 3
},
{
"start": "13:00",
"end": "00:00",
"value": 0
}
]
df = pd.DataFrame(data, columns=['start', 'end'])
print(df.head(5))
starts = pd.DataFrame(data, columns=['start'])
starts["value"] = -1
starts = starts.set_index("start")
ends = pd.DataFrame(data, columns=['end'])
ends["value"] = 1
ends = ends.set_index("end")
breakpointsStarts = pd.DataFrame(breakpoints, columns=['start', 'value']).set_index("start")
breakpointsEnds = pd.DataFrame(breakpoints, columns=['end', 'value'])
breakpointsEnds["value"] = breakpointsEnds["value"].transform(lambda x: -x)
breakpointsEnds = breakpointsEnds.set_index("end")
countsDf = pd.concat([starts, ends, breakpointsEnds, breakpointsStarts]).sort_index()
countsDf = countsDf.groupby(countsDf.index).sum().cumsum()
print(countsDf)
# Periods that are available
df = countsDf
df["available"] = df["value"] > 0
# Indexes where the value of available changes
# Alternatively swap out available for the value.
time_changes = df["available"].diff()[df["available"].diff() != 0].index.values
newDf = pd.DataFrame(time_changes, columns= ["start"])
# Setting the end column to the value of the next start
newDf['end'] = newDf.transform(np.roll, shift=-1)
print(newDf)
# Join this back in to get the actual value of available
mergedDf = newDf.merge(df, left_on="start", right_index=True)
print(mergedDf)
最后返回:
start end value available
0 00:00 09:00 0 False
1 09:00 13:00 4 True
2 13:00 00:00 0 False
我會像對待約會一樣對待它。 將空閑時間建模為單獨的約會。 對於每個結束約會,檢查是否還有另一個正在進行中,如果是,請跳過此處。 如果不是,請查找下一個開始約會(開始日期大於此結束日期的約會。)
在您迭代完所有約會之后,您應該有一個倒置的面具。
對我來說,這個問題可以用布爾值列表很好地表示。 為了便於解釋,我們假設每個潛在作業的長度是 15 分鍾的倍數。 因此,從 9 點到 6 點,我們有 135 個要跟蹤可用性的“時間段”。 我們用布爾變量表示一個時隙中隊列的可用性:如果隊列正在處理作業,則為False
,如果隊列可用則為True
。
首先,我們為每個隊列以及輸出創建一個時隙列表。 因此,每個隊列和輸出都有時隙 t k ,1 <= k <= 135。
然后,給定五個作業隊列 q j , 1 <= j <= 5,如果至少存在一個 q j ,其中索引 k 處的時隙列表為True
,我們說 t k在時間 k 是“開放的”。
我們可以在獨立的 Python 中實現它,如下所示:
slots = [ True ] * 135
queues = [ slots ] * 5
output = [ False ] * 135
def available (k):
for q in queues:
if q[k]:
return True
return False
然后我們可以假設存在一些函數dispatch (length)
將作業分配給可用隊列,將queue[q]
的適當插槽設置為False
。
最后,要更新輸出,我們只需調用:
def update():
for k in range(0, 135):
output[k] = available[k]
或者,為了提高效率:
def update(i, j):
for k in range(i, j):
output[k] = available[k]
然后,您可以簡單地調用update(i, j)
每當dispatch()
更新時間槽i
到j
以用於新作業。 這樣,調度和更新是一個 O(n) 操作,其中n
是有多少個時隙被改變,而不管有多少個時隙。
這將允許您創建一個簡單的函數,將人類可讀的時間映射到時隙值的范圍,這將允許根據需要使時隙變大或變小。
您還可以輕松地擴展此想法以使用每列是一個隊列的Series.any()
數據框,從而允許您一次在每一行上使用Series.any()
來快速更新輸出列。
很想聽聽關於這種方法的建議! 也許我錯過了這個問題的復雜性,但我認為這是一個很好的解決方案。
您可以使用(datetime, increment)
元組來跟蹤可用性的變化。 作業開始事件的increment = 1
,作業結束事件的increment = -1
。 然后itertools.accumulate
允許計算隨着時間的推移作業開始和結束時的累積可用性。 這是一個示例實現:
from datetime import time
import itertools as it
def compute_availability(jobs, opening_hours, capacity):
jobs = [((x, -1), (y, +1)) for x, y in jobs]
opens, closes = opening_hours
events = [[opens, capacity]] + sorted(t for job in jobs for t in job) + [(closes, 0)]
availability = list(it.accumulate(events,
lambda x, y: [y[0], x[1] + y[1]]))
for x, y in zip(availability, availability[1:]):
# If multiple events happen at the same time, only yield the last one.
if y[0] > x[0]:
yield x
這增加了人工(opens, capacity)
和(closes, 0)
事件來初始化計算。 上面的示例考慮了一天,但通過創建分別共享第一個和最后一個作業的datetime
opens
和closes
datetime
對象,很容易將其擴展到多天。
這是 OP 示例計划的輸出:
from pprint import pprint
jobs = [(time(10), time(15)),
(time(9), time(11)),
(time(12, 30), time(16)),
(time(10), time(18))]
availability = list(compute_availability(
jobs, opening_hours=(time(9), time(18)), capacity=3
))
pprint(availability)
打印:
[[datetime.time(9, 0), 2],
[datetime.time(10, 0), 0],
[datetime.time(11, 0), 1],
[datetime.time(12, 30), 0],
[datetime.time(15, 0), 1],
[datetime.time(16, 0), 2]]
第一個元素表示可用性何時發生變化,第二個元素表示該變化導致的可用性。 例如,上午 9 點提交一個作業,導致可用性從 3 下降到 2,然后在上午 10 點提交另外兩個作業,而第一個作業仍在運行(因此可用性下降到 0)。
現在我們已經計算了初始可用性,一個重要的方面是在添加新作業時更新它。 這里最好不要從完整的作業列表中重新計算可用性,因為如果正在跟蹤許多作業,這可能會很昂貴。 因為availability
已經排序,我們可以使用bisect
模塊來確定 O(log(N)) 中的相關更新范圍。 然后需要執行許多步驟。 假設作業被安排為[x, y]
,其中x
, y
是兩個日期時間對象。
[x, y]
區間內的可用性是否大於零(包括x
左側的事件(即前一個事件))。[x, y]
中所有事件的可用性降低 1。x
不在事件列表中,我們需要添加它,否則我們需要檢查是否可以將x
事件與剩下的事件合並。y
不在事件列表中,我們需要添加它。這是相關的代碼:
import bisect
def add_job(availability, job, *, weight=1):
"""weight: how many lanes the job requires"""
job = list(job)
start = bisect.bisect(availability, job[:1])
# Emulate a `bisect_right` which doens't work directly since
# we're comparing lists of different length.
if start < len(availability):
start += (job[0] == availability[start][0])
stop = bisect.bisect(availability, job[1:])
if any(slot[1] < weight for slot in availability[start-1:stop]):
raise ValueError('The requested time slot is not available')
for slot in availability[start:stop]:
slot[1] -= weight
if job[0] > availability[start-1][0]:
previous_availability = availability[start-1][1]
availability.insert(start, [job[0], previous_availability - weight])
stop += 1
else:
availability[start-1][1] -= weight
if start >= 2 and availability[start-1][1] == availability[start-2][1]:
del availability[start-1]
stop -= 1
if stop == len(availability) or job[1] < availability[stop][0]:
previous_availability = availability[stop-1][1]
availability.insert(stop, [job[1], previous_availability + weight])
我們可以通過向 OP 的示例計划添加一些作業來測試它:
for job in [[time(15), time(17)],
[time(11, 30), time(12)],
[time(13), time(14)]]: # this one should raise since availability is zero
print(f'\nAdding {job = }')
add_job(availability, job)
pprint(availability)
輸出:
Adding job = [datetime.time(15, 0), datetime.time(17, 0)]
[[datetime.time(9, 0), 2],
[datetime.time(10, 0), 0],
[datetime.time(11, 0), 1],
[datetime.time(12, 30), 0],
[datetime.time(16, 0), 1],
[datetime.time(17, 0), 2]]
Adding job = [datetime.time(11, 30), datetime.time(12, 0)]
[[datetime.time(9, 0), 2],
[datetime.time(10, 0), 0],
[datetime.time(11, 0), 1],
[datetime.time(11, 30), 0],
[datetime.time(12, 0), 1],
[datetime.time(12, 30), 0],
[datetime.time(16, 0), 1],
[datetime.time(17, 0), 2]]
Adding job = [datetime.time(13, 0), datetime.time(14, 0)]
Traceback (most recent call last):
[...]
ValueError: The requested time slot is not available
我們還可以使用此接口在服務不可用的時間段(例如從下午 6 點到第二天上午 9 點)封鎖所有車道。 只需在該時間段內提交一個weight=capacity
的工作:
add_job(availability,
[datetime(2020, 3, 14, 18), datetime(2020, 3, 15, 9)]
weight=3)
我們還可以使用add_job
從頭開始構建完整的計划:
availability = availability = list(compute_availability(
[], opening_hours=(time(9), time(18)), capacity=3
))
print('Initial availability')
pprint(availability)
for job in jobs:
print(f'\nAdding {job = }')
add_job(availability, job)
pprint(availability)
輸出:
Initial availability
[[datetime.time(9, 0), 3]]
Adding job = (datetime.time(10, 0), datetime.time(15, 0))
[[datetime.time(9, 0), 3],
[datetime.time(10, 0), 2],
[datetime.time(15, 0), 3]]
Adding job = (datetime.time(9, 0), datetime.time(11, 0))
[[datetime.time(9, 0), 2],
[datetime.time(10, 0), 1],
[datetime.time(11, 0), 2],
[datetime.time(15, 0), 3]]
Adding job = (datetime.time(12, 30), datetime.time(16, 0))
[[datetime.time(9, 0), 2],
[datetime.time(10, 0), 1],
[datetime.time(11, 0), 2],
[datetime.time(12, 30), 1],
[datetime.time(15, 0), 2],
[datetime.time(16, 0), 3]]
Adding job = (datetime.time(10, 0), datetime.time(18, 0))
[[datetime.time(9, 0), 2],
[datetime.time(10, 0), 0],
[datetime.time(11, 0), 1],
[datetime.time(12, 30), 0],
[datetime.time(15, 0), 1],
[datetime.time(16, 0), 2],
[datetime.time(18, 0), 3]]
除非您的時間分辨率小於一分鍾,否則我建議使用一天中的分鍾圖,並在每個工作的時間跨度內分配一組 jobId
例如:
# convert time to minute of the day (assumes24H time, but you can make this your own way)
def toMinute(time):
return sum(p*t for p,t in zip(map(int,time.split(":")),(60,1)))
def toTime(minute):
return f"{minute//60}:{minute%60:02d}"
# booking a job adds it to all minutes covered by its duration
def book(timeMap,jobId,start,duration):
startMin = toMinute(start)
for m in range(startMin,startMin+duration):
timeMap[m].add(jobId)
# unbooking a job removes it from all minutes where it was present
def unbook(timeMap,jobId):
for s in timeMap:
s.discard(jobId)
# return time ranges for minutes meeting a given condition
def minuteSpans(timeMap,condition,start="09:00",end="18:00"):
start,end = toMinute(start),toMinute(end)
timeRange = timeMap[start:end]
match = [condition(s) for s in timeRange]
breaks = [True] + [a!=b for a,b in zip(match,match[1:])]
starts = [i for (i,a),b in zip(enumerate(match),breaks) if b]
return [(start+s,start+e) for s,e in zip(starts,starts[1:]+[len(match)]) if match[s]]
def timeSpans(timeMap,condition,start="09:00",end="18:00"):
return [(toTime(s),toTime(e)) for s,e in minuteSpans(timeMap,condition,start,end)]
# availability is ranges of minutes where the number of jobs is less than your capacity
def available(timeMap,start="09:00",end="18:00",maxJobs=5):
return timeSpans(timeMap,lambda s:len(s)<maxJobs,start,end)
示例用法:
timeMap = [set() for _ in range(1440)]
book(timeMap,"job1","9:45",25)
book(timeMap,"job2","9:30",45)
book(timeMap,"job3","9:00",90)
print(available(timeMap,maxJobs=3))
[('9:00', '9:45'), ('10:10', '18:00')]
print(timeSpans(timeMap,lambda s:"job3" in s))
[('9:00', '10:30')]
通過一些調整,您甚至可以擁有跳過某些時間段(例如午餐時間)的不連續工作。 您還可以通過在其中放置假工作來阻止某些時期。
如果您需要單獨管理作業隊列,您可以擁有單獨的時間圖(每個隊列一個),並在需要全局圖時將它們合二為一:
print(available(timeMap1,maxJobs=1))
print(available(timeMap2,maxJobs=1))
print(available(timeMap3,maxJobs=1))
globalMap = list(set.union(*qs) for qs in zip(timeMap1,timeMap2,timeMap3))
print(available(globalMap),maxJobs=3)
將所有這些放入 TimeMap 類(而不是單個函數)中,您應該有一個非常好的工具集可以使用。
您可以使用表示可以運行作業的通道的專用類。 這些對象可以跟蹤作業及其可用性:
import bisect
from datetime import time
from functools import total_ordering
import math
@total_ordering
class TimeSlot:
def __init__(self, start, stop, lane):
self.start = start
self.stop = stop
self.lane = lane
def __contains__(self, other):
return self.start <= other.start and self.stop >= other.stop
def __lt__(self, other):
return (self.start, -self.stop.second) < (other.start, -other.stop.second)
def __eq__(self, other):
return (self.start, -self.stop.second) == (other.start, -other.stop.second)
def __str__(self):
return f'({self.lane}) {[self.start, self.stop]}'
__repr__ = __str__
class Lane:
@total_ordering
class TimeHorizon:
def __repr__(self):
return '...'
def __lt__(self, other):
return False
def __eq__(self, other):
return False
@property
def second(self):
return math.inf
@property
def timestamp(self):
return math.inf
time_horizon = TimeHorizon()
del TimeHorizon
def __init__(self, start, nr):
self.nr = nr
self.availability = [TimeSlot(start, self.time_horizon, self)]
def add_job(self, job):
if not isinstance(job, TimeSlot):
job = TimeSlot(*job, self)
# We want to bisect_right but only on the start time,
# so we need to do it manually if they are equal.
index = bisect.bisect_left(self.availability, job)
if index < len(self.availability):
index += (job.start == self.availability[index].start)
index -= 1 # select the corresponding free slot
slot = self.availability[index]
if slot.start > job.start or slot.stop is not self.time_horizon and job.stop > slot.stop:
raise ValueError('Requested time slot not available')
if job == slot:
del self.availability[index]
elif job.start == slot.start:
slot.start = job.stop
elif job.stop == slot.stop:
slot.stop = job.start
else:
slot_end = slot.stop
slot.stop = job.start
self.availability.insert(index+1, TimeSlot(job.stop, slot_end, self))
可以按如下方式使用Lane
對象:
lane = Lane(start=time(9), nr=1)
print(lane.availability)
lane.add_job([time(11), time(14)])
print(lane.availability)
輸出:
[(1) [datetime.time(9, 0), ...]]
[(1) [datetime.time(9, 0), datetime.time(11, 0)],
(1) [datetime.time(14, 0), ...]]
添加作業后,可用性也會更新。
現在我們可以一起使用多個這些車道對象來表示一個完整的時間表。 可以根據需要添加作業,可用性將自動更新:
class Schedule:
def __init__(self, n_lanes, start):
self.lanes = [Lane(start, nr=i) for i in range(n_lanes)]
def add_job(self, job):
for lane in self.lanes:
try:
lane.add_job(job)
except ValueError:
pass
else:
break
from pprint import pprint
# Example jobs from OP.
jobs = [(time(10), time(15)),
(time(9), time(11)),
(time(12, 30), time(16)),
(time(10), time(18))]
schedule = Schedule(3, start=time(9))
for job in jobs:
schedule.add_job(job)
for lane in schedule.lanes:
pprint(lane.availability)
輸出:
[(0) [datetime.time(9, 0), datetime.time(10, 0)],
(0) [datetime.time(15, 0), ...]]
[(1) [datetime.time(11, 0), datetime.time(12, 30)],
(1) [datetime.time(16, 0), ...]]
[(2) [datetime.time(9, 0), datetime.time(10, 0)],
(2) [datetime.time(18, 0), ...]]
我們可以創建一個專用的樹狀結構,跟蹤所有通道的時隙,以便在注冊新作業時選擇最適合的時隙。 樹中的節點代表單個時隙,其子節點是包含在該時隙內的所有時隙。 然后,在注冊新工作時,我們可以搜索樹以找到最佳位置。 樹和車道共享相同的時隙,因此我們只需要在刪除或插入新時隙時手動調整時隙。 這是相關代碼,它有點冗長(只是一個快速草稿):
import itertools as it
class OneStepBuffered:
"""Can back up elements that are consumed by `it.takewhile`.
From: https://stackoverflow.com/a/30615837/3767239
"""
_sentinel = object()
def __init__(self, it):
self._it = iter(it)
self._last = self._sentinel
self._next = self._sentinel
def __iter__(self):
return self
def __next__(self):
sentinel = self._sentinel
if self._next is not sentinel:
next_val, self._next = self._next, sentinel
return next_val
try:
self._last = next(self._it)
return self._last
except StopIteration:
self._last = self._next = sentinel
raise
def step_back(self):
if self._last is self._sentinel:
raise ValueError("Can't back up a step")
self._next, self._last = self._last, self._sentinel
class SlotTree:
def __init__(self, slot, subslots, parent=None):
self.parent = parent
self.slot = slot
self.subslots = []
slots = OneStepBuffered(subslots)
for slot in slots:
subslots = it.takewhile(lambda x: x.stop <= slot.stop, slots)
self.subslots.append(SlotTree(slot, subslots, self))
try:
slots.step_back()
except ValueError:
break
def __str__(self):
sub_repr = ['\n| '.join(str(slot).split('\n'))
for slot in self.subslots]
sub_repr = [f'| {x}' for x in sub_repr]
sub_repr = '\n'.join(sub_repr)
sep = '\n' if sub_repr else ''
return f'{self.slot}{sep}{sub_repr}'
def find_minimal_containing_slot(self, slot):
try:
return min(self.find_containing_slots(slot),
key=lambda x: x.slot.stop.second - x.slot.start.second)
except ValueError:
raise ValueError('Requested time slot not available') from None
def find_containing_slots(self, slot):
for candidate in self.subslots:
if slot in candidate.slot:
yield from candidate.find_containing_slots(slot)
yield candidate
@classmethod
def from_slots(cls, slots):
# Ascending in start time, descending in stop time (secondary).
return cls(cls.__name__, sorted(slots))
class Schedule:
def __init__(self, n_lanes, start):
self.lanes = [Lane(start, i+1) for i in range(n_lanes)]
self.slots = SlotTree.from_slots(
s for lane in self.lanes for s in lane.availability)
def add_job(self, job):
if not isinstance(job, TimeSlot):
job = TimeSlot(*job, lane=None)
# Minimal containing slot is one possible strategy,
# others can be implemented as well.
slot = self.slots.find_minimal_containing_slot(job)
lane = slot.slot.lane
if job == slot.slot:
slot.parent.subslots.remove(slot)
elif job.start > slot.slot.start and job.stop < slot.slot.stop:
slot.parent.subslots.insert(
slot.parent.subslots.index(slot) + 1,
SlotTree(TimeSlot(job.stop, slot.slot.stop, lane), [], slot.parent))
lane.add_job(job)
現在我們可以使用Schedule
類自動將作業分配到通道並更新它們的可用性:
if __name__ == '__main__':
jobs = [(time(10), time(15)), # example from OP
(time(9), time(11)),
(time(12, 30), time(16)),
(time(10), time(18))]
schedule = Schedule(3, start=time(9))
print(schedule.slots, end='\n\n')
for job in jobs:
print(f'Adding {TimeSlot(*job, "new slot")}')
schedule.add_job(job)
print(schedule.slots, end='\n\n')
輸出:
SlotTree
| (1) [datetime.time(9, 0), ...]
| (2) [datetime.time(9, 0), ...]
| (3) [datetime.time(9, 0), ...]
Adding (new slot) [datetime.time(10, 0), datetime.time(15, 0)]
SlotTree
| (1) [datetime.time(9, 0), datetime.time(10, 0)]
| (1) [datetime.time(15, 0), ...]
| (2) [datetime.time(9, 0), ...]
| (3) [datetime.time(9, 0), ...]
Adding (new slot) [datetime.time(9, 0), datetime.time(11, 0)]
SlotTree
| (1) [datetime.time(9, 0), datetime.time(10, 0)]
| (1) [datetime.time(15, 0), ...]
| (2) [datetime.time(11, 0), ...]
| (3) [datetime.time(9, 0), ...]
Adding (new slot) [datetime.time(12, 30), datetime.time(16, 0)]
SlotTree
| (1) [datetime.time(9, 0), datetime.time(10, 0)]
| (1) [datetime.time(15, 0), ...]
| (2) [datetime.time(11, 0), datetime.time(12, 30)]
| (2) [datetime.time(16, 0), ...]
| (3) [datetime.time(9, 0), ...]
Adding (new slot) [datetime.time(10, 0), datetime.time(18, 0)]
SlotTree
| (1) [datetime.time(9, 0), datetime.time(10, 0)]
| (1) [datetime.time(15, 0), ...]
| (2) [datetime.time(11, 0), datetime.time(12, 30)]
| (2) [datetime.time(16, 0), ...]
| (3) [datetime.time(9, 0), datetime.time(10, 0)]
| (3) [datetime.time(18, 0), ...]
數字(i)
表示通道編號, []
表示該通道上的可用時隙。 A ...
表示“開放式結束”(時間范圍)。 正如我們所看到的,當調整時隙時,樹不會自行重組; 這將是一個可能的改進。 理想情況下,對於每個新工作,相應的最佳時隙將從樹中彈出,然后,根據工作如何適應時隙,將調整后的版本和可能的新時隙推回到樹中(或者根本沒有,如果該工作完全適合該插槽)。
上面的例子只考慮了一個日期和time
對象,但代碼很容易擴展以使用datetime
對象。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.