[英]How to aggregate a pandas pivot table across subrows and subcolumns
I'm using pandas in python to pivot some data and I want to be able to perform 2 types of aggregation across parts of my pivot tables.我在 python 中使用 Pandas 来透视一些数据,我希望能够跨透视表的各个部分执行两种类型的聚合。 I'm aware I can use the margins to perform an aggregation across all rows/columns.我知道我可以使用边距对所有行/列执行聚合。 But I want to aggregate multiple rows (not all) across a single column or aggregate multiple columns across a single row.但我想在单列中聚合多行(不是全部)或在单行中聚合多列。 How do I best aggregate subrows and subcolumns in pandas?如何最好地聚合熊猫中的子行和子列?
Example code setup:示例代码设置:
#Dataset
rows = [
[1, 'Factory_1', 'crusher', 'electricity_usage', 15],
[2, 'Factory_1', 'mixer', 'electricity_usage', 11],
[3, 'Factory_1', 'turner', 'electricity_usage', 12],
[4, 'Factory_2', 'crusher', 'electricity_usage', 2],
[5, 'Factory_2', 'mixer', 'electricity_usage', 7],
[6, 'Factory_2', 'turner', 'electricity_usage', 13],
[7, 'Factory_1', 'crusher', 'running_hours', 6],
[8, 'Factory_1', 'mixer', 'running_hours', 5],
[9, 'Factory_1', 'turner', 'running_hours', 5],
[10, 'Factory_2', 'crusher', 'running_hours', 1],
[11, 'Factory_2', 'mixer', 'running_hours', 3],
[12, 'Factory_2', 'turner', 'running_hours', 6]
]
dataFrame = pds.DataFrame(rows, columns=["id","Location","Type","recorded_type","value"])
#Pivot Table 1: Form multi row aggregation across a single column
ptable_1 = pds.pivot_table(data=dataFrame,index=['Location', 'Type'], columns=["recorded_type"], values=['value'])
print(ptable_1)
#Pivot Table 2: Form multi column aggregation across a single row
ptable_2 = pds.pivot_table(data=dataFrame,index=['recorded_type'], columns=["Location", "Type"], values=['value'])
print(ptable_2)
Below I have my attempt at aggregating pivot 1 across multiple rows in a single column.下面我尝试在单个列中的多行中聚合枢轴 1。 I'm trying to aggregate the sum of all machines recorded_values per location.我正在尝试汇总每个位置的所有机器 record_values 的总和。 Can this be done any better?这能做得更好吗?
#Form aggregation across multiple rows in a single column
df1 = ptable_1.groupby(level=[0]).sum()
df1['Type'] = ["all", "all"]
#Reset index so machine_location is removed from current index
df1.reset_index(inplace=True)
#Set multi-index of location and type
df1.set_index(['Location', 'Type'], inplace=True)
#Concat both dataframes
aggregated_table_1 = pds.concat([ptable_1.reset_index(),df1.reset_index()], ignore_index=True)
#Sort values by location, so appened table values are in the correct position
aggregated_table_1.sort_values('Location', inplace=True)
print(aggregated_table_1)
For example, I'm trying to aggregate the electricity usage of all machine-types for a particular factory.例如,我正在尝试汇总特定工厂所有机器类型的用电量。 So the aggregate is in the Type column with the type 'all' The expected output for ptable_1:所以聚合在 Type 列中,类型为 'all' ptable_1 的预期输出:
+---------------+-----------+---------+-------------------+---------------+
| | Location | Type | value | value |
+---------------+-----------+---------+-------------------+---------------+
| recorded_type | | | electricity_usage | running_hours |
| | Factory_1 | crusher | 15 | 6 |
| | Factory_1 | mixer | 11 | 5 |
| | Factory_1 | turner | 12 | 5 |
| | Factory_1 | all | 38 | 16 |
| | Factory_2 | crusher | 2 | 1 |
| | Factory_2 | mixer | 7 | 3 |
| | Factory_2 | turner | 13 | 6 |
| | Factory_2 | all | 22 | 10 |
+---------------+-----------+---------+-------------------+---------------+
Secondly, I'm not sure how to aggregate across subcolums as below to make a sum of all columns per type for ptable_2.其次,我不确定如何跨子列进行聚合,如下所示,以求 ptable_2 的每种类型的所有列的总和。 The aggregate is a new column with Type as 'all'聚合是一个新列,类型为“全部”
The expected output for ptable_2: ptable_2 的预期输出:
+-------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| Location | Factory_1 | Factory_1 | Factory_1 | Factory_1 | Factory_2 | Factory_2 | Factory_2 | Factory_2 |
+-------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| Type | crusher | mixer | turner | all | crusher | mixer | turner | all |
| recorded_type | | | | | | | | |
| electricity_usage | 15 | 11 | 12 | 38 | 2 | 7 | 13 | 22 |
| running_hours | 6 | 5 | 5 | 16 | 1 | 3 | 6 | 10 |
+-------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
Edit 1 Here is my output straight out of python applying Serge de Gosson de Varennes approach of melt() using default params.编辑 1这是我使用默认参数应用 Serge de Gosson de Varennes 方法 Melt() 直接从 python 输出的输出。 I lose track of the recorded_type for each row, which is replaced by a NaN column.我忘记了每行的记录类型,它被 NaN 列替换。 Should I be trying to aggregate from this to form my expected output?我应该尝试从中汇总以形成我的预期输出吗?
Df_ex1 = dfex1.melt() # Expected output 1
NaN recorded_type value
0 value electricity_usage 15
1 value electricity_usage 11
2 value electricity_usage 12
3 value electricity_usage 2
4 value electricity_usage 7
5 value electricity_usage 13
6 value running_hours 6
7 value running_hours 5
8 value running_hours 5
9 value running_hours 1
10 value running_hours 3
11 value running_hours 6
Df_exp2 = dfex2.melt() # Expected output 2
NaN Location Type value
0 value Factory_1 crusher 15
1 value Factory_1 crusher 6
2 value Factory_1 mixer 11
3 value Factory_1 mixer 5
4 value Factory_1 turner 12
5 value Factory_1 turner 5
6 value Factory_2 crusher 2
7 value Factory_2 crusher 1
8 value Factory_2 mixer 7
9 value Factory_2 mixer 3
10 value Factory_2 turner 13
11 value Factory_2 turner 6
You almost got it right: you need to melt your dataframe:你几乎做对了:你需要融化你的数据框:
import pandas as pds
rows = [
[1, 'Factory_1', 'crusher', 'electricity_usage', 15],
[2, 'Factory_1', 'mixer', 'electricity_usage', 11],
[3, 'Factory_1', 'turner', 'electricity_usage', 12],
[4, 'Factory_2', 'crusher', 'electricity_usage', 2],
[5, 'Factory_2', 'mixer', 'electricity_usage', 7],
[6, 'Factory_2', 'turner', 'electricity_usage', 13],
[7, 'Factory_1', 'crusher', 'running_hours', 6],
[8, 'Factory_1', 'mixer', 'running_hours', 5],
[9, 'Factory_1', 'turner', 'running_hours', 5],
[10, 'Factory_2', 'crusher', 'running_hours', 1],
[11, 'Factory_2', 'mixer', 'running_hours', 3],
[12, 'Factory_2', 'turner', 'running_hours', 6]
]
dataFrame = pds.DataFrame(rows, columns=["id","Location","Type","recorded_type","value"])
ptable_1 = pds.pivot_table(data=dataFrame,index=['Location', 'Type'], columns=["recorded_type"], values=['value'])
ptable_2 = pds.pivot_table(data=dataFrame,index=['recorded_type'], columns=["Location", "Type"], values=['value'])
df = pds.DataFrame(ptable_1)
dfex1 = pds.DataFrame(ptable_1)
dfex2 = pds.DataFrame(ptable_2)
gives you给你
Df_ex1 = dfex1.melt # Expected output 1
Df_exp2 = dfex2.melt # Expected output 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.