如何在 Pandas 中将多列转换为单独的行/值？

Question

I'm sure this question has been answered, but unfortunately I don't know what to call this operation, and so my search is failing me.我确定这个问题已经得到了回答，但不幸的是我不知道如何称呼这个操作，所以我的搜索失败了。 It is almost like a reverse pivot table.它几乎就像一个反向数据透视表。

Lets say I have the following payroll data:假设我有以下工资单数据：

data = [
    {'employee': 1, 'date': '2020-01-04', 'reg': 8, 'ot': 0, 'dt': 0},
    {'employee': 1, 'date': '2020-01-05', 'reg': 4, 'ot': 4, 'dt': 0},
    {'employee': 1, 'date': '2020-01-06', 'reg': 0, 'ot': 0, 'dt': 4},
    {'employee': 2, 'date': '2020-01-04', 'reg': 6, 'ot': 2, 'dt': 0},
    {'employee': 2, 'date': '2020-01-05', 'reg': 3, 'ot': 5, 'dt': 0},
    {'employee': 2, 'date': '2020-01-06', 'reg': 0, 'ot': 4, 'dt': 0},
]

data_df = pd.DataFrame(data)

What I need to do is break each rate ('reg', 'ot', and 'dt') for each employee/date, out into its own row that has a column for the rate label, and a column for the number of hours, keeping the other non-rate-based columns.我需要做的是将每个员工/日期的每个费率（'reg'、'ot' 和 'dt'）分解为自己的行，其中有一列用于表示费率标签，一列用于表示小时，保留其他非基于费率的列。 Additionally, I dont want a row for any rates where the value is zero.此外，我不希望值为零的任何费率都有一行。 For the data above, I am looking to get:对于上述数据，我希望得到：

result = [
    {'employee': 1, 'date': '2020-01-04', 'rate': 'reg', 'hours': 8},
    {'employee': 1, 'date': '2020-01-05', 'rate': 'reg', 'hours': 4},
    {'employee': 1, 'date': '2020-01-05', 'rate': 'ot', 'hours': 4},
    {'employee': 1, 'date': '2020-01-06', 'rate': 'dt', 'hours': 4},
    {'employee': 2, 'date': '2020-01-04', 'rate': 'reg', 'hours': 6},
    {'employee': 2, 'date': '2020-01-04', 'rate': 'ot', 'hours': 2},
    {'employee': 2, 'date': '2020-01-05', 'rate': 'reg', 'hours': 3},
    {'employee': 2, 'date': '2020-01-05', 'rate': 'ot', 'hours': 5},
    {'employee': 2, 'date': '2020-01-06', 'rate': 'ot', 'hours': 4},
]

result_df = pd.DataFrame(result)

Any thoughts on how to accomplish this would be greatly appreciated!关于如何实现这一点的任何想法将不胜感激！

Answer 1

Try using melt :尝试使用melt ：

(data_df.melt(['employee','date'], 
             var_name='rate', 
             value_name='hours')
        .query('hours != 0'))

Output:输出：

    employee        date rate  hours
0          1  2020-01-04  reg      8
1          1  2020-01-05  reg      4
3          2  2020-01-04  reg      6
4          2  2020-01-05  reg      3
7          1  2020-01-05   ot      4
9          2  2020-01-04   ot      2
10         2  2020-01-05   ot      5
11         2  2020-01-06   ot      4
14         1  2020-01-06   dt      4

Answer 2

This should do the trick:这应该可以解决问题：

data_df=data_df.set_index(["employee", "date"]).stack().reset_index().rename(columns={"level_2": "rate", 0: "hours"})

Output:输出：

    employee        date rate  hours
0          1  2020-01-04  reg      8
1          1  2020-01-04   ot      0
2          1  2020-01-04   dt      0
3          1  2020-01-05  reg      4
4          1  2020-01-05   ot      4
5          1  2020-01-05   dt      0
6          1  2020-01-06  reg      0
7          1  2020-01-06   ot      0
8          1  2020-01-06   dt      4
9          2  2020-01-04  reg      6
10         2  2020-01-04   ot      2
11         2  2020-01-04   dt      0
12         2  2020-01-05  reg      3
13         2  2020-01-05   ot      5
14         2  2020-01-05   dt      0
15         2  2020-01-06  reg      0
16         2  2020-01-06   ot      4
17         2  2020-01-06   dt      0

如何在 Pandas 中将多列转换为单独的行/值？

问题描述

2 个解决方案

解决方案1
7 已采纳 2020-01-21 17:26:01

解决方案2
1 2020-01-21 17:27:24

如何在 Pandas 中将多列转换为单独的行/值？

问题描述

2 个解决方案

解决方案1 7 已采纳 2020-01-21 17:26:01

解决方案2 1 2020-01-21 17:27:24

解决方案1
7 已采纳 2020-01-21 17:26:01

解决方案2
1 2020-01-21 17:27:24