I have a two-columns dataframe that contains the input data. The first column is a starting date and the second is called duration (minutes). You can think of a machine that runs from start to start+duration. I want to use this information to build an 1d-array of length 8760*60 that contains all the minutes of the year and there should be a 1 at the place where the machine runs and a zero otherwise. The following MWE does the task, but is slow because of the for-loop and I would not know how to vectorize it.
import pandas as pd
import numpy as np
# Start and end of time horizon
start = pd.Timestamp(year=2019, month=1, day=1, hour=0, tz='UTC')
end = pd.Timestamp(year=2019, month=12, day=31, hour=23, minute=59, tz='UTC')
# DataFrame of time horizon
dates = pd.DataFrame(pd.date_range(start, end, freq='min'))
# Starting points
t1 = pd.Timestamp(year=2019, month=1, day=2, hour=0, tz='UTC')
t2 = pd.Timestamp(year=2019, month=1, day=1, hour=0, minute=3, tz='UTC')
# Durations
d1 = 5
d2 = 30
# DataFrame from input data
data = pd.DataFrame(
data=[
[t1, d1],
[t2, d2],
],
columns=[
'start',
'duration',
]
)
# Array to be filled
on = np.zeros(8760*60)
# loop over data rows
for idx in data.index:
# Start for on array from dates
start = dates[dates[0] == data.loc[idx, 'start']].index[0]
# Duration from data
duration = data.loc[idx, 'duration']
# Put 1s in the on array from start to start+duration
on[start: start+duration] = 1
Does this work for you:
idx = pd.date_range(pd.Timestamp('2019-01-01', tz='UTC'),
pd.Timestamp('2019-12-31', tz='UTC'),
freq='1min')
df = pd.DataFrame({'on': 0}, index=idx)
def to_mins(row):
return set(pd.date_range(row['start'], periods=row['duration'], freq='1min'))
idx_on = set().union(*data[['start', 'duration']].apply(to_mins, axis='columns'))
df.loc[idx_on] = 1
on = df.on.values
If it's possible that the duration leads to timestamps outside of 2019, you could use:
def to_min_range(row):
return set(
m
for m in pd.date_range(row['start'], periods=row['duration'], freq='1min')
if m.year == 2019
)
idx = pd.date_range(start, end, freq="min").to_series()
data["duration"] = pd.to_timedelta(data.duration, unit="min")
res = pd.concat(
[
i.to_series()
for i in data.apply(
lambda row: pd.date_range(row[0], row.sum(), freq="min", closed="right"),
axis=1,
)
]
)
_, res = idx.align(res)
on = (~res.isna()).astype("int").values
This also uses apply
len(data)
times, hopefully that's a much smaller number than the number of minutes in a year.
EDIT:
The time intervals are closed on both ends. In the comments the OP mentioned they expect the sum to match the total duration the equipment was on, this can be achieved by making one of the ends of the interval open. Since the sum is expected to be the duration it's on, I chose right
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.