Pandas dataframe - reshaping columns using start and end pairs

Question

I have been trying to figure out a way to transform this dataframe. It contains only two columns; one is timeseries data and the other is event markers. Here is an example of the initial dataframe:

df1 = pd.DataFrame({'Time': ['1:42 AM','2:30 AM','3:29 AM','4:19 AM','4:37 AM','4:59 AM','5:25 AM','5:33 AM','6:48 AM'], 
                    'Event': ['End','Start','End','Start','End','Start','Start','End','Start']})

This is how I would like it to look after transformation:

df2 = pd.DataFrame({'Start': ['', '2:30 AM', '4:19 AM', '4:59 AM', '5:25 AM', '6:48 AM'],
                    'End': ['1:42 AM', '3:29 AM', '4:37 AM', '', '5:33 AM', '']})

Essentially, I want to make the event markers the new columns, with the start and end times paired up down the table as they happen.

This example includes both exceptions that will sometimes happen:

First row of data is an 'End' marker (data gets truncated by date).
Sometimes there will not be an 'End' marker for a specific 'Start' marker (job failed or wasn't complete at time the report was run).

I looked at pivot and pivot_table but I wasn't able to add an index that got the output I wanted. Pretty sure this should be possible, I just am not an expert with dataframes yet.

Answer 1

Try:

df1["tmp"] = df1["Event"].eq("Start").cumsum()
df1 = df1.pivot(index="tmp", columns="Event", values="Time").fillna("")
df1.columns.name, df1.index.name = None, None

print(df1[["Start", "End"]])

Prints:

     Start      End
0           1:42 AM
1  2:30 AM  3:29 AM
2  4:19 AM  4:37 AM
3  4:59 AM         
4  5:25 AM  5:33 AM
5  6:48 AM

Answer 2

The difficulty of this operation is that you have to create your custom index. Besides using pivot_table and some index operations to get your desired format:

piv = (
    df1
    .assign(idx=df1["Event"].eq("Start").cumsum())
    .pivot_table(index="idx", columns="Event", values="Time", aggfunc=lambda x: x, fill_value="")
    .rename_axis(None, axis="columns")
    .reset_index(drop=True)
    .reindex(columns=["Start", "End"])
)

     Start      End
0           1:42 AM
1  2:30 AM  3:29 AM
2  4:19 AM  4:37 AM
3  4:59 AM         
4  5:25 AM  5:33 AM
5  6:48 AM

Pandas dataframe - reshaping columns using start and end pairs

Question

2 answers

solution1
3 2022-09-19 21:56:39

solution2
1 2022-09-19 22:03:06

Pandas dataframe - reshaping columns using start and end pairs

Question

2 answers

solution1 3 2022-09-19 21:56:39

solution2 1 2022-09-19 22:03:06

solution1
3 2022-09-19 21:56:39

solution2
1 2022-09-19 22:03:06