I have the following dataframe which has the data of different jobs start and end time at different intervals. A small part of dataframe is shown below.
Dataframe(df):
result | job | time
START | JOB0 | 1357
START | JOB2 | 2405
END | JOB2 | 2379
START | JOB3 | 4010
END | JOB0 | 5209
END | JOB3 | 6578
START | JOB0 | 6000
END | JOB0 | 6100
(Note - Original Dataframe has 5 Jobs (JOB0 to JOB4) I want to convert the values ( START
and END
) of column result
as individual columns in the dataframe.
Required Dataframe(df2)
job | START | END
JOB0 | 1357 | 5209
JOB2 | 2405 | 2379
JOB3 | 4010 | 6578
JOB0 | 6000 | 6100
Code
I tried implementing this using a pivot_table
but it is giving aggregated values which is not required.
df2 = df.pivot_table('time', 'job','result')
Code Output
result | END | START
job
JOB0 | 5.000589e+08 5.000636e+08
JOB1 | 4.999141e+08 4.999188e+08
JOB2 | 5.001668e+08 5.001715e+08
JOB3 | 4.995190e+08 4.995187e+08
JOB4 | 5.003238e+08 5.003236e+08
How can I attain the required dataframe?
You have duplicate job
( JOB0
has 2 different start and end times), so you need to pivot also by the cumulative count ( groupby.cumcount
) so indices are unique. You can tidy up the pivot afterwards by dropping the cumulative count level.
df['idx'] = df.groupby(['job', 'result']).cumcount()
(df.pivot(index=['job', 'idx'], columns='result', values='time')
.sort_index(level=1)
.droplevel(1)
.reset_index()
.rename_axis(None, axis=1)[['job', 'START', 'END']])
[out]
job START END
0 JOB0 1357 5209
1 JOB2 2405 2379
2 JOB3 4010 6578
3 JOB0 6000 6100
if the jobs are unique that is only 1 entry for start and 1 entry for end per job then you can use this. Otherwise use the answer by Chris A
df.pivot_table(index=['job'], columns=['result'], values=['time'], aggfunc=np.max)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.