[英]Complex Groupby or Pivot Table Calculation in Python Pandas
I have a dataframe that has been grouped as follows: 我有一个数据框,其分组如下:
UNIT CA DATE SCP TIME LABEL VALUES1 VALUES2
R001 A058 08-01-13 01-00-00 01:00:00 REGULAR 340751.000 194975.000
05:00:00 REGULAR 340753.000 194975.000
09:00:00 REGULAR 341251.000 194984.000
09:39:56 REGULAR 341440.000 194994.000
13:00:00 REGULAR 341808.000 195061.000
17:00:00 REGULAR 342030.000 195295.000
21:00:00 REGULAR 342214.000 195659.000
01-00-01 01:00:00 REGULAR 245262.000 221709.000
05:00:00 REGULAR 245262.000 221709.000
09:00:00 REGULAR 245428.000 221742.000
09:39:56 REGULAR 245508.000 221754.000
13:00:00 REGULAR 245620.000 221856.000
17:00:00 REGULAR 245679.000 222178.000
21:00:00 REGULAR 245743.000 222604.000
I want to extract the max and min values for VALUE1 and VALUE2 for each SCP, calculate the difference, and return in the following format: 我想为每个SCP提取VALUE1和VALUE2的最大值和最小值,计算差异,并以以下格式返回:
UNIT CA DATE SCP DIFF OF MAX - MIN VALUE1 DIFF OF MAX - MIN VALUE2
R001 A058 08-01-13 01-00-00 .... ....
01-00-01 .... ....
I cant figure out how to do it. 我不知道该怎么做。 I believe there must be some way to do it using groupby or pivot_table. 我相信必须有一些使用groupby或ivot_table的方法。
Thanks in advance. 提前致谢。
IIUC, .groupby()
on level
should work. IIUC, .groupby()
在level
应该起作用。 Starting with your sample data: 从示例数据开始:
df.set_index(['UNIT', 'CA', 'DATE', 'SCP'], inplace=True)
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 14 entries, (R001, A058, 2013-08-01 00:00:00, 01-00-00) to (R001, A058, 2013-08-01 00:00:00, 01-00-01)
Data columns (total 4 columns):
TIME 14 non-null object
LABEL 14 non-null object
VALUES1 14 non-null int64
VALUES2 14 non-null int64
dtypes: int64(2), object(2)
TIME LABEL VALUES1 VALUES2
UNIT CA DATE SCP
R001 A058 2013-08-01 01-00-00 01:00:00 REGULAR 340751 194975
01-00-00 05:00:00 REGULAR 340753 194975
01-00-00 09:00:00 REGULAR 341251 194984
01-00-00 09:39:56 REGULAR 341440 194994
01-00-00 13:00:00 REGULAR 341808 195061
01-00-00 17:00:00 REGULAR 342030 195295
01-00-00 21:00:00 REGULAR 342214 195659
01-00-01 01:00:00 REGULAR 245262 221709
01-00-01 05:00:00 REGULAR 245262 221709
01-00-01 09:00:00 REGULAR 245428 221742
01-00-01 09:39:56 REGULAR 245508 221754
01-00-01 13:00:00 REGULAR 245620 221856
01-00-01 17:00:00 REGULAR 245679 222178
01-00-01 21:00:00 REGULAR 245743 222604
Group on the MultiIndex
levels, and apply the difference of max()
and min()
for each of the two columns: 在MultiIndex
级别上MultiIndex
,并对两列中的每列应用max()
和min()
之差:
df.groupby(level=['UNIT', 'CA', 'DATE', 'SCP'])['VALUES1', 'VALUES2'].apply(lambda x: x.max()-x.min())
VALUES1 VALUES2
UNIT CA DATE SCP
R001 A058 2013-08-01 01-00-00 1463 684
01-00-01 481 895
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.