I have the following dataframe which has the data of different jobs start and end time at different intervals. A small part of dataframe is shown below.
Dataframe(df):
result | job | time
START | JOB0 | 1357
START | JOB2 | 2405
END | JOB2 | 2379
START | JOB3 | 4010
END | JOB0 | 5209
END | JOB3 | 6578
START | JOB0 | 6000
END | JOB0 | 6100
(Note - Original Dataframe has 5 Jobs (JOB0 to JOB4) I want to convert the values ( START
and END
) of column result
as individual columns in the dataframe.
Required Dataframe(df2)
job | START | END
JOB0 | 1357 | 5209
JOB2 | 2405 | 2379
JOB3 | 4010 | 6578
JOB0 | 6000 | 6100
Code
I tried implementing this using a pivot_table
but it is giving aggregated values which is not required.
df2 = df.pivot_table('time', 'job','result')
Code Output
result | END | START
job
JOB0 | 5.000589e+08 5.000636e+08
JOB1 | 4.999141e+08 4.999188e+08
JOB2 | 5.001668e+08 5.001715e+08
JOB3 | 4.995190e+08 4.995187e+08
JOB4 | 5.003238e+08 5.003236e+08
How can I attain the required dataframe?
You have repeated jobs( JOB0
), So its better to create a unique id for the jobs then pivot it based on id
and job
like
df['id'] = df.groupby(['job', 'result']).cumcount()
df2 = df.pivot_table(index=['id','job'], columns='result', values='time')
Output:
result END START
id job
0 JOB0 5209 1357
JOB2 2379 2405
JOB3 6578 4010
1 JOB0 6100 6000
Have the df sorted on time
for sanity (There could be issues if the same jobs overlap each other)
df = df.sort_values(by='time')
You should be able to use pandas.DataFrame.pivot
for this as follows:
import pandas as pd
df2 = df.pivot(index="job", columns="result", values="time")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.