简体   繁体   中英

split pandas dataframe row into multiple rows by tasks

I have a pandas dataframe and would like to split each row containing multiple tasks into a new row.

The dataframe columns are:
start_t_i = start time of a task.
end_t_i = end time of a task.
weight_i = cost of a task.

For example, suppose I have the following dataframe df1:

task_name start_t1 end_t1 weight_1 start_t_2 end_t2 weight_2.. start_t_k end_t_k weight_k
john 5 7 1 9 10 9
sally 3 4 1 8 11 7 19 21 1
tom 1 2 3

I would like to transform it into the following df2:

task_name start_t end_t weight
john 5 7 1
john 9 10 9
sally 3 4 1
sally 8 11 7
sally 19 21 1
tom 1 2 3

so far I managed to transform df1 manually into df2 by assuming each person has only a maximum of two tasks. my question is how can I get a df such as df2 from df1 when there are up to k tasks for each person.

If we assume that every task has a start_t, end_t and weight_t columns then we can use:

cols = list(df.columns[1:])
task_col = df.columns[0]
dfs = []
for i in range(0, len(cols), 3):
    subset_cols = cols[i:i+3]
    rename_cols = {subset_cols[0]: 'start_t', subset_cols[1]:'end_t', subset_cols[2]:'weight'}
    dfs.append(df[[task_col] + subset_cols].rename(columns=rename_cols))

transformed_df = pd.concat(dfs, ignore_index=True).sort_values(by='task_name').dropna(subset=['start_t', 'end_t', 'weight']).reset_index(drop=True)

OUTPUT:

  task_name  start_t  end_t  weight
0      john      5.0    7.0     1.0
1      john      9.0   10.0     9.0
2     sally      3.0    4.0     1.0
3     sally      8.0   11.0     7.0
4     sally     19.0   21.0     1.0
5       tom      1.0    2.0     3.0

You can melt the columns, extract the valid column names and pivot :

df = df.replace("", np.NaN).melt("task_name").dropna()
df["variable"] = df["variable"].str.extract("(^[A-Za-z]+)_")

print (df.assign(count=df.groupby(["task_name", "variable"]).cumcount())
         .pivot(["count", "task_name"], "variable", "value")
         .reset_index(0, drop=True).sort_index())

variable    end  start  weight
task_name                     
john        7.0    5.0     1.0
john       10.0    9.0     9.0
sally       4.0    3.0     1.0
sally      11.0    8.0     7.0
sally      21.0   19.0     1.0
tom         2.0    1.0     3.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM