简体   繁体   English

如何跨子行和子列聚合熊猫数据透视表

[英]How to aggregate a pandas pivot table across subrows and subcolumns

I'm using pandas in python to pivot some data and I want to be able to perform 2 types of aggregation across parts of my pivot tables.我在 python 中使用 Pandas 来透视一些数据,我希望能够跨透视表的各个部分执行两种类型的聚合。 I'm aware I can use the margins to perform an aggregation across all rows/columns.我知道我可以使用边距对所有行/列执行聚合。 But I want to aggregate multiple rows (not all) across a single column or aggregate multiple columns across a single row.但我想在单列中聚合多行(不是全部)或在单行中聚合多列。 How do I best aggregate subrows and subcolumns in pandas?如何最好地聚合熊猫中的子行和子列?

Example code setup:示例代码设置:

#Dataset
rows = [
    [1, 'Factory_1', 'crusher', 'electricity_usage', 15],
    [2, 'Factory_1', 'mixer', 'electricity_usage', 11],
    [3, 'Factory_1', 'turner', 'electricity_usage', 12],
    [4, 'Factory_2', 'crusher', 'electricity_usage', 2],
    [5, 'Factory_2', 'mixer', 'electricity_usage', 7],
    [6, 'Factory_2', 'turner', 'electricity_usage', 13],
    [7, 'Factory_1', 'crusher', 'running_hours', 6],
    [8, 'Factory_1', 'mixer', 'running_hours', 5],
    [9, 'Factory_1', 'turner', 'running_hours', 5],
    [10, 'Factory_2', 'crusher', 'running_hours', 1],
    [11, 'Factory_2', 'mixer', 'running_hours', 3],
    [12, 'Factory_2', 'turner', 'running_hours', 6]
]

dataFrame = pds.DataFrame(rows, columns=["id","Location","Type","recorded_type","value"])

#Pivot Table 1: Form multi row aggregation across a single column
ptable_1 = pds.pivot_table(data=dataFrame,index=['Location', 'Type'], columns=["recorded_type"], values=['value'])
print(ptable_1)

#Pivot Table 2: Form multi column aggregation across a single row
ptable_2 = pds.pivot_table(data=dataFrame,index=['recorded_type'], columns=["Location", "Type"], values=['value'])
print(ptable_2)

Below I have my attempt at aggregating pivot 1 across multiple rows in a single column.下面我尝试在单个列中的多行中聚合枢轴 1。 I'm trying to aggregate the sum of all machines recorded_values per location.我正在尝试汇总每个位置的所有机器 record_values 的总和。 Can this be done any better?这能做得更好吗?

#Form aggregation across multiple rows in a single column

df1 = ptable_1.groupby(level=[0]).sum()
df1['Type'] = ["all", "all"]
#Reset index so machine_location is removed from current index
df1.reset_index(inplace=True)
#Set multi-index of location and type
df1.set_index(['Location', 'Type'], inplace=True)
#Concat both dataframes
aggregated_table_1 = pds.concat([ptable_1.reset_index(),df1.reset_index()], ignore_index=True)
#Sort values by location, so appened table values are in the correct position
aggregated_table_1.sort_values('Location', inplace=True)

print(aggregated_table_1)

For example, I'm trying to aggregate the electricity usage of all machine-types for a particular factory.例如,我正在尝试汇总特定工厂所有机器类型的用电量。 So the aggregate is in the Type column with the type 'all' The expected output for ptable_1:所以聚合在 Type 列中,类型为 'all' ptable_1 的预期输出:

+---------------+-----------+---------+-------------------+---------------+
|               | Location  |  Type   |       value       |     value     |
+---------------+-----------+---------+-------------------+---------------+
| recorded_type |           |         | electricity_usage | running_hours |
|               | Factory_1 | crusher | 15                | 6             |
|               | Factory_1 | mixer   | 11                | 5             |
|               | Factory_1 | turner  | 12                | 5             |
|               | Factory_1 | all     | 38                | 16            |
|               | Factory_2 | crusher | 2                 | 1             |
|               | Factory_2 | mixer   | 7                 | 3             |
|               | Factory_2 | turner  | 13                | 6             |
|               | Factory_2 | all     | 22                | 10            |
+---------------+-----------+---------+-------------------+---------------+

Secondly, I'm not sure how to aggregate across subcolums as below to make a sum of all columns per type for ptable_2.其次,我不确定如何跨子列进行聚合,如下所示,以求 ptable_2 的每种类型的所有列的总和。 The aggregate is a new column with Type as 'all'聚合是一个新列,类型为“全部”

The expected output for ptable_2: ptable_2 的预期输出:

+-------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
|     Location      | Factory_1 | Factory_1 | Factory_1 | Factory_1 | Factory_2 | Factory_2 | Factory_2 | Factory_2 |
+-------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| Type              | crusher   | mixer     | turner    | all       | crusher   | mixer     | turner    | all       |
| recorded_type     |           |           |           |           |           |           |           |           |
| electricity_usage | 15        | 11        | 12        | 38        | 2         | 7         | 13        | 22        |
| running_hours     | 6         | 5         | 5         | 16        | 1         | 3         | 6         | 10        |
+-------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+

Edit 1 Here is my output straight out of python applying Serge de Gosson de Varennes approach of melt() using default params.编辑 1这是我使用默认参数应用 Serge de Gosson de Varennes 方法 Melt() 直接从 python 输出的输出。 I lose track of the recorded_type for each row, which is replaced by a NaN column.我忘记了每行的记录类型,它被 NaN 列替换。 Should I be trying to aggregate from this to form my expected output?我应该尝试从中汇总以形成我的预期输出吗?

Df_ex1 = dfex1.melt() # Expected output 1
      NaN      recorded_type  value
0   value  electricity_usage     15
1   value  electricity_usage     11
2   value  electricity_usage     12
3   value  electricity_usage      2
4   value  electricity_usage      7
5   value  electricity_usage     13
6   value      running_hours      6
7   value      running_hours      5
8   value      running_hours      5
9   value      running_hours      1
10  value      running_hours      3
11  value      running_hours      6


Df_exp2 = dfex2.melt() # Expected output 2
      NaN   Location     Type  value
0   value  Factory_1  crusher     15
1   value  Factory_1  crusher      6
2   value  Factory_1    mixer     11
3   value  Factory_1    mixer      5
4   value  Factory_1   turner     12
5   value  Factory_1   turner      5
6   value  Factory_2  crusher      2
7   value  Factory_2  crusher      1
8   value  Factory_2    mixer      7
9   value  Factory_2    mixer      3
10  value  Factory_2   turner     13
11  value  Factory_2   turner      6

You almost got it right: you need to melt your dataframe:你几乎做对了:你需要融化你的数据框:

import pandas as pds
rows = [
    [1, 'Factory_1', 'crusher', 'electricity_usage', 15],
    [2, 'Factory_1', 'mixer', 'electricity_usage', 11],
    [3, 'Factory_1', 'turner', 'electricity_usage', 12],
    [4, 'Factory_2', 'crusher', 'electricity_usage', 2],
    [5, 'Factory_2', 'mixer', 'electricity_usage', 7],
    [6, 'Factory_2', 'turner', 'electricity_usage', 13],
    [7, 'Factory_1', 'crusher', 'running_hours', 6],
    [8, 'Factory_1', 'mixer', 'running_hours', 5],
    [9, 'Factory_1', 'turner', 'running_hours', 5],
    [10, 'Factory_2', 'crusher', 'running_hours', 1],
    [11, 'Factory_2', 'mixer', 'running_hours', 3],
    [12, 'Factory_2', 'turner', 'running_hours', 6]
]

dataFrame = pds.DataFrame(rows, columns=["id","Location","Type","recorded_type","value"])


ptable_1 = pds.pivot_table(data=dataFrame,index=['Location', 'Type'], columns=["recorded_type"], values=['value'])


ptable_2 = pds.pivot_table(data=dataFrame,index=['recorded_type'], columns=["Location", "Type"], values=['value'])
df = pds.DataFrame(ptable_1)

dfex1 = pds.DataFrame(ptable_1)
dfex2 = pds.DataFrame(ptable_2)

gives you给你

Df_ex1 = dfex1.melt # Expected output 1
Df_exp2 = dfex2.melt # Expected output 2

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM