Python Pandas中的复杂Groupby或数据透视表计算

Question

I have a dataframe that has been grouped as follows: 我有一个数据框，其分组如下：

UNIT CA    DATE     SCP      TIME       LABEL       VALUES1      VALUES2           
R001 A058  08-01-13 01-00-00 01:00:00  REGULAR    340751.000   194975.000
                             05:00:00  REGULAR    340753.000   194975.000
                             09:00:00  REGULAR    341251.000   194984.000
                             09:39:56  REGULAR    341440.000   194994.000
                             13:00:00  REGULAR    341808.000   195061.000
                             17:00:00  REGULAR    342030.000   195295.000
                             21:00:00  REGULAR    342214.000   195659.000
                    01-00-01 01:00:00  REGULAR    245262.000   221709.000
                             05:00:00  REGULAR    245262.000   221709.000
                             09:00:00  REGULAR    245428.000   221742.000
                             09:39:56  REGULAR    245508.000   221754.000
                             13:00:00  REGULAR    245620.000   221856.000
                             17:00:00  REGULAR    245679.000   222178.000
                             21:00:00  REGULAR    245743.000   222604.000

I want to extract the max and min values for VALUE1 and VALUE2 for each SCP, calculate the difference, and return in the following format: 我想为每个SCP提取VALUE1和VALUE2的最大值和最小值，计算差异，并以以下格式返回：

UNIT CA    DATE      SCP     DIFF OF MAX - MIN VALUE1   DIFF OF MAX - MIN VALUE2         
R001 A058  08-01-13 01-00-00        ....                         ....
                    01-00-01        ....                         ....

I cant figure out how to do it. 我不知道该怎么做。 I believe there must be some way to do it using groupby or pivot_table. 我相信必须有一些使用groupby或ivot_table的方法。

Thanks in advance. 提前致谢。

Answer 1

IIUC, .groupby() on level should work. IIUC， .groupby()在level应该起作用。 Starting with your sample data: 从示例数据开始：

df.set_index(['UNIT', 'CA', 'DATE', 'SCP'], inplace=True)

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 14 entries, (R001, A058, 2013-08-01 00:00:00, 01-00-00) to (R001, A058, 2013-08-01 00:00:00, 01-00-01)
Data columns (total 4 columns):
TIME       14 non-null object
LABEL      14 non-null object
VALUES1    14 non-null int64
VALUES2    14 non-null int64
dtypes: int64(2), object(2)

                                   TIME    LABEL  VALUES1  VALUES2
UNIT CA   DATE       SCP                                          
R001 A058 2013-08-01 01-00-00  01:00:00  REGULAR   340751   194975
                     01-00-00  05:00:00  REGULAR   340753   194975
                     01-00-00  09:00:00  REGULAR   341251   194984
                     01-00-00  09:39:56  REGULAR   341440   194994
                     01-00-00  13:00:00  REGULAR   341808   195061
                     01-00-00  17:00:00  REGULAR   342030   195295
                     01-00-00  21:00:00  REGULAR   342214   195659
                     01-00-01  01:00:00  REGULAR   245262   221709
                     01-00-01  05:00:00  REGULAR   245262   221709
                     01-00-01  09:00:00  REGULAR   245428   221742
                     01-00-01  09:39:56  REGULAR   245508   221754
                     01-00-01  13:00:00  REGULAR   245620   221856
                     01-00-01  17:00:00  REGULAR   245679   222178
                     01-00-01  21:00:00  REGULAR   245743   222604

Group on the MultiIndex levels, and apply the difference of max() and min() for each of the two columns: 在MultiIndex级别上MultiIndex ，并对两列中的每列应用max()和min()之差：

df.groupby(level=['UNIT', 'CA', 'DATE', 'SCP'])['VALUES1', 'VALUES2'].apply(lambda x: x.max()-x.min())

                               VALUES1  VALUES2
UNIT CA   DATE       SCP                       
R001 A058 2013-08-01 01-00-00     1463      684
                     01-00-01      481      895

Python Pandas中的复杂Groupby或数据透视表计算

问题描述

1 个解决方案

解决方案1
2 已采纳 2016-05-16 00:47:45

Python Pandas中的复杂Groupby或数据透视表计算

问题描述

1 个解决方案

解决方案1 2 已采纳 2016-05-16 00:47:45

解决方案1
2 已采纳 2016-05-16 00:47:45