I have a DataFrame with a multi-index consisting of (phase, service_group, station, year, period) whose purpose is to return "capacity_required" when all 5 values of the multi-index are specified. For example in phase Final, service-group West, station Milton, year 2025, period Peak Hour 1, the required_capacity is 1500.
Currently there are 7 possible periods, two of which are "Off-Peak Hour" and "Shoulder Hour".
I need to add a new period to every instance of the multi-index, called Off-Peak Shoulder, where the new value is defined as the average of Off-Peak Hour and Shoulder Hour.
So far I have the following code:
import pandas as pd
import os
directory = '/Users/mark/PycharmProjects/psrpcl_data'
capacity_required_file = 'Capacity_Requirements.csv'
capacity_required_path = os.path.join(directory, capacity_required_file)
df_capacity_required = pd.read_csv(capacity_required_path, sep=',',
usecols=['phase', 'service_group', 'station', 'year', 'period', 'capacity_required'])
df_capacity_required.set_index(['phase', 'service_group', 'station', 'year'], inplace=True)
df_capacity_required.sort_index(inplace=True)
print(df_capacity_required.head(14))
And the output from the above code is:
period capacity_required
phase service_group station year
Early Barrie Allandale Waterfront Station 2025 AM Peak Period 490
2025 Off-Peak Hour 100
2025 PM Peak Period 520
2025 Peak Hour 2 250
2025 Peak Hour 5 180
2025 Peak Hour 6 180
2025 Shoulder Hour 250
2026 AM Peak Period 520
2026 Off-Peak Hour 50
2026 PM Peak Period 520
2026 Peak Hour 2 260
2026 Peak Hour 5 180
2026 Peak Hour 6 180
2026 Shoulder Hour 250
The above is just the first 14 lines of about 30K lines. This shows you two years worth of periods. Notice there are 7 periods per year.
I am trying to create a new "period" called "Off-Peak Shoulder" to be added to every single (phase, service_group, station, year) combination which is to be the average of Off-Peak and Shoulder.
The following line correctly calculates the one Off-Peak Shoulder value per index value:
off_peak_shoulder = df_capacity_required.loc[df_capacity_required.period == 'Off-Peak Hour', 'capacity_required'].add(
df_capacity_required.loc[df_capacity_required.period == 'Shoulder', 'capacity_required']).div(2)
print(off_peak_shoulder)
The above code provides the following (correct) Off-Peak Shoulder series as output:
phase service_group station year
Early Barrie Allandale Waterfront Station 2025 0.0
2026 0.0
2027 0.0
2028 0.0
2029 0.0
...
Initial Union Pearson Express Pearson Station 2023 160.0
2024 160.0
Weston Station 2022 80.0
2023 105.0
2024 105.0
Question: How do I merge/join the off_peak_shoulder series into df_capacity_required to get Off-Peak Shoulder to be one more entry under "period", as shown below?
period capacity_required
phase service_group station year
Early Barrie Allandale Waterfront Station 2025 AM Peak Period 490
2025 Off-Peak Hour 100
2025 PM Peak Period 520
2025 Peak Hour 2 250
2025 Peak Hour 5 180
2025 Peak Hour 6 180
2025 Shoulder Hour 250
2025 Off-Peak Shoulder 175
2026 AM Peak Period 520
2026 Off-Peak Hour 50
2026 PM Peak Period 520
2026 Peak Hour 2 260
2026 Peak Hour 5 180
2026 Peak Hour 6 180
2026 Shoulder Hour 250
2025 Off-Peak Shoulder 150
I slept on the problem and woke up with a solution. I already have the list of values I need, with the correct multi-index set for each value. I was thinking I needed some complex multi-index insertion code, but actually I just needed to put the created DataFrame in the same form as the original DataFrame, and concat the two together.
Here is the code I added. Note the first line is the same as the original code, except I added a call to reset_index.
df_new = df_capacity_required.loc[df_capacity_required.period == 'Off-Peak Hour', 'capacity_required'].add(
df_capacity_required.loc[df_capacity_required.period == 'Shoulder Hour', 'capacity_required']).div(2).reset_index()
df_new['period'] = 'Off-Peak Shoulder'
df_new.set_index(['phase', 'service_group', 'station', 'year'], inplace=True)
df_capacity_required = concat([df_capacity_required, df_new])
df_capacity_required.sort_index(inplace=True)
print_full(df_capacity_required.head(16))
And that print statement gives the following desired output:
period capacity_required
phase service_group station year
Early Barrie Allandale Waterfront Station 2025 AM Peak Period 490
2025 Off-Peak Hour 100
2025 PM Peak Period 520
2025 Peak Hour 2 250
2025 Peak Hour 5 180
2025 Peak Hour 6 180
2025 Shoulder Hour 250
2025 Off-Peak Shoulder 175
2026 AM Peak Period 520
2026 Off-Peak Hour 50
2026 PM Peak Period 520
2026 Peak Hour 2 260
2026 Peak Hour 5 180
2026 Peak Hour 6 180
2026 Shoulder Hour 250
2026 Off-Peak Shoulder 150
But thanks for everyone who read the question. It is very nice knowing there are people out there on StackOverflow willing to help with someone gets stuck.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.