I have a pandas dataframe and would like to split each row containing multiple tasks into a new row.
The dataframe columns are:
start_t_i = start time of a task.
end_t_i = end time of a task.
weight_i = cost of a task.
For example, suppose I have the following dataframe df1:
task_name | start_t1 | end_t1 | weight_1 | start_t_2 | end_t2 | weight_2.. | start_t_k | end_t_k | weight_k |
---|---|---|---|---|---|---|---|---|---|
john | 5 | 7 | 1 | 9 | 10 | 9 | |||
sally | 3 | 4 | 1 | 8 | 11 | 7 | 19 | 21 | 1 |
tom | 1 | 2 | 3 |
I would like to transform it into the following df2:
task_name | start_t | end_t | weight |
---|---|---|---|
john | 5 | 7 | 1 |
john | 9 | 10 | 9 |
sally | 3 | 4 | 1 |
sally | 8 | 11 | 7 |
sally | 19 | 21 | 1 |
tom | 1 | 2 | 3 |
so far I managed to transform df1 manually into df2 by assuming each person has only a maximum of two tasks. my question is how can I get a df such as df2 from df1 when there are up to k tasks for each person.
If we assume that every task has a start_t, end_t and weight_t
columns then we can use:
cols = list(df.columns[1:])
task_col = df.columns[0]
dfs = []
for i in range(0, len(cols), 3):
subset_cols = cols[i:i+3]
rename_cols = {subset_cols[0]: 'start_t', subset_cols[1]:'end_t', subset_cols[2]:'weight'}
dfs.append(df[[task_col] + subset_cols].rename(columns=rename_cols))
transformed_df = pd.concat(dfs, ignore_index=True).sort_values(by='task_name').dropna(subset=['start_t', 'end_t', 'weight']).reset_index(drop=True)
OUTPUT:
task_name start_t end_t weight
0 john 5.0 7.0 1.0
1 john 9.0 10.0 9.0
2 sally 3.0 4.0 1.0
3 sally 8.0 11.0 7.0
4 sally 19.0 21.0 1.0
5 tom 1.0 2.0 3.0
You can melt
the columns, extract the valid column names and pivot
:
df = df.replace("", np.NaN).melt("task_name").dropna()
df["variable"] = df["variable"].str.extract("(^[A-Za-z]+)_")
print (df.assign(count=df.groupby(["task_name", "variable"]).cumcount())
.pivot(["count", "task_name"], "variable", "value")
.reset_index(0, drop=True).sort_index())
variable end start weight
task_name
john 7.0 5.0 1.0
john 10.0 9.0 9.0
sally 4.0 3.0 1.0
sally 11.0 8.0 7.0
sally 21.0 19.0 1.0
tom 2.0 1.0 3.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.