简体   繁体   中英

How to merge a series as members of one column of a multi-index of a DataFrame

I have a DataFrame with a multi-index consisting of (phase, service_group, station, year, period) whose purpose is to return "capacity_required" when all 5 values of the multi-index are specified. For example in phase Final, service-group West, station Milton, year 2025, period Peak Hour 1, the required_capacity is 1500.

Currently there are 7 possible periods, two of which are "Off-Peak Hour" and "Shoulder Hour".

I need to add a new period to every instance of the multi-index, called Off-Peak Shoulder, where the new value is defined as the average of Off-Peak Hour and Shoulder Hour.

So far I have the following code:

import pandas as pd
import os

directory = '/Users/mark/PycharmProjects/psrpcl_data'
capacity_required_file = 'Capacity_Requirements.csv'
capacity_required_path = os.path.join(directory, capacity_required_file)

df_capacity_required = pd.read_csv(capacity_required_path, sep=',',
                       usecols=['phase', 'service_group', 'station', 'year', 'period', 'capacity_required'])

df_capacity_required.set_index(['phase', 'service_group', 'station', 'year'], inplace=True)
df_capacity_required.sort_index(inplace=True)

print(df_capacity_required.head(14))

And the output from the above code is:

                                                               period  capacity_required
phase service_group station                      year
Early Barrie        Allandale Waterfront Station 2025  AM Peak Period                490
                                                 2025   Off-Peak Hour                100
                                                 2025  PM Peak Period                520
                                                 2025     Peak Hour 2                250
                                                 2025     Peak Hour 5                180
                                                 2025     Peak Hour 6                180
                                                 2025   Shoulder Hour                250
                                                 2026  AM Peak Period                520
                                                 2026   Off-Peak Hour                50
                                                 2026  PM Peak Period                520
                                                 2026     Peak Hour 2                260
                                                 2026     Peak Hour 5                180
                                                 2026     Peak Hour 6                180
                                                 2026   Shoulder Hour                250

The above is just the first 14 lines of about 30K lines. This shows you two years worth of periods. Notice there are 7 periods per year.

I am trying to create a new "period" called "Off-Peak Shoulder" to be added to every single (phase, service_group, station, year) combination which is to be the average of Off-Peak and Shoulder.

The following line correctly calculates the one Off-Peak Shoulder value per index value:

off_peak_shoulder = df_capacity_required.loc[df_capacity_required.period == 'Off-Peak Hour', 'capacity_required'].add(
                    df_capacity_required.loc[df_capacity_required.period == 'Shoulder', 'capacity_required']).div(2)

print(off_peak_shoulder)

The above code provides the following (correct) Off-Peak Shoulder series as output:

phase    service_group          station                       year
Early    Barrie                 Allandale Waterfront Station  2025      0.0
                                                              2026      0.0
                                                              2027      0.0
                                                              2028      0.0
                                                              2029      0.0
                                                                      ...
Initial  Union Pearson Express  Pearson Station               2023    160.0
                                                              2024    160.0
                                Weston Station                2022     80.0
                                                              2023    105.0
                                                              2024    105.0

Question: How do I merge/join the off_peak_shoulder series into df_capacity_required to get Off-Peak Shoulder to be one more entry under "period", as shown below?

                                                               period  capacity_required
phase service_group station                      year
Early Barrie        Allandale Waterfront Station 2025    AM Peak Period                490
                                                 2025     Off-Peak Hour                100
                                                 2025    PM Peak Period                520
                                                 2025       Peak Hour 2                250
                                                 2025       Peak Hour 5                180
                                                 2025       Peak Hour 6                180
                                                 2025     Shoulder Hour                250
                                                 2025 Off-Peak Shoulder                175
                                                 2026    AM Peak Period                520
                                                 2026     Off-Peak Hour                50
                                                 2026    PM Peak Period                520
                                                 2026       Peak Hour 2                260
                                                 2026       Peak Hour 5                180
                                                 2026       Peak Hour 6                180
                                                 2026     Shoulder Hour                250
                                                 2025 Off-Peak Shoulder                150

I slept on the problem and woke up with a solution. I already have the list of values I need, with the correct multi-index set for each value. I was thinking I needed some complex multi-index insertion code, but actually I just needed to put the created DataFrame in the same form as the original DataFrame, and concat the two together.

Here is the code I added. Note the first line is the same as the original code, except I added a call to reset_index.

    df_new = df_capacity_required.loc[df_capacity_required.period == 'Off-Peak Hour', 'capacity_required'].add(
        df_capacity_required.loc[df_capacity_required.period == 'Shoulder Hour', 'capacity_required']).div(2).reset_index()
    df_new['period'] = 'Off-Peak Shoulder'
    df_new.set_index(['phase', 'service_group', 'station', 'year'], inplace=True)
 
    df_capacity_required = concat([df_capacity_required, df_new])
    df_capacity_required.sort_index(inplace=True)

    print_full(df_capacity_required.head(16))

And that print statement gives the following desired output:

                                                               period  capacity_required
phase service_group station                      year
Early Barrie        Allandale Waterfront Station 2025    AM Peak Period                490
                                                 2025     Off-Peak Hour                100
                                                 2025    PM Peak Period                520
                                                 2025       Peak Hour 2                250
                                                 2025       Peak Hour 5                180
                                                 2025       Peak Hour 6                180
                                                 2025     Shoulder Hour                250
                                                 2025 Off-Peak Shoulder                175
                                                 2026    AM Peak Period                520
                                                 2026     Off-Peak Hour                50
                                                 2026    PM Peak Period                520
                                                 2026       Peak Hour 2                260
                                                 2026       Peak Hour 5                180
                                                 2026       Peak Hour 6                180
                                                 2026     Shoulder Hour                250
                                                 2026 Off-Peak Shoulder                150

But thanks for everyone who read the question. It is very nice knowing there are people out there on StackOverflow willing to help with someone gets stuck.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM