简体   繁体   English

python pandas multiindex减去具有匹配1级索引的行

[英]python pandas multiindex subtract rows with matching level 1 index

pandas DataFrame: 熊猫DataFrame:

Constructor: 构造函数:

iterables = [[date(2018,5,31),date(2018,6,26),date(2018,6,29),date(2018,7,1)], 
['test1','test2']]
indx = pd.MultiIndex.from_product(iterables, names=['date','tests'])
col = ['tests_passing', 'tests_total']
data = np.array([[834,3476],[229,256],[1524,1738],[78,144],[1595,1738],[78,144],[1595,1738],[142,144]])
df = pd.DataFrame(data, index=indx, columns=col)
df = df.assign(tests_remaining= df['tests_total'] - df['tests_passing'])

Dataframe: 数据框:

                 tests_passing  tests_total  tests_remaining
date       tests                                             
2018-05-31 test1            834         3476             2642
           test2            229          256               27
2018-06-26 test1           1524         1738              214
           test2             78          144               66
2018-06-29 test1           1595         1738              143
           test2             78          144               66
2018-07-01 test1           1595         1738              143
           test2            142          144                2

This data consists of a number of test measurements (test1,test2,...,etc) each collected on some date. 此数据由一些在某个日期收集的测试测量值(test1,test2等)组成。 I want to create a new column in this dataframe named 'progress' which would in general select all rows where test = unique test (test1 for example) across all dates and subtract the 'tests_remaining' column value for that row at date0 with the next value for row at date1,date2,...,etc so basically: df.loc[(date0,test0),'progress'] = df.loc[(date0,test0),'tests_remaining']-df.loc[(date1,test0),'tests_remaining] (with the one exception that the first date would have a progress value of 0 since it was the first collected date). 我想在此数据框中创建一个名为“ progress”的新列,该列通常会选择所有日期中test =唯一测试(例如,test1)的所有行,并在date0减去该行的“ tests_remaining”列值,并添加下一个date1,date2等的行的值基本上是这样的: df.loc[(date0,test0),'progress'] = df.loc[(date0,test0),'tests_remaining']-df.loc[(date1,test0),'tests_remaining] (但有一个例外,因为第一个日期是第一个收集的日期,所以其进度值为0)。

The desired output will look like this: 所需的输出将如下所示:

                 tests_passing  tests_total  tests_remaining  progress
date      tests                                                       
5/31/2018 test1            834         3476             2642         0
          test2            229          256               27         0
6/26/2018 test1           1524         1738              214      2428
          test2             78          144               66       -39
6/29/2018 test1           1595         1738              143        71
          test2             78          144               66         0
7/1/2018  test1           1595         1738              143         0
          test2            142          144                2        64

So far I have been able to use loc[] with slices to select a single test at a time and perform this calculation as a resultant pandas Series, but I am unable to do this in general across all tests without specifying the test name explicitly in the split. 到目前为止,我已经能够使用带有切片的loc []一次选择一个测试并将其作为结果熊猫系列执行此计算,但是如果没有在中明确指定测试名称,我通常无法在所有测试中执行此操作分裂。 This is not a reasonable solution for me as in the real data there are hundreds of tests. 这对我来说不是一个合理的解决方案,因为在真实数据中有数百种测试。

All = slice(None)
df_slice = df.loc[(All,'test1'),'tests_remaining']
sub = df_slice.diff(periods=-1).shift(1).fillna(0);sub

date        tests
2018-05-31  test1       0.0
2018-06-26  test1    2428.0
2018-06-29  test1      71.0
2018-07-01  test1       0.0
Name: tests_remaining, dtype: float64

Is there a more pandas idiomatic way to create the desired column as described? 有没有更多的熊猫惯用方式来创建所需的列,如上所述?

Thanks in advance for your help! 在此先感谢您的帮助!

You can groupby level test and do diff 您可以按级别groupby测试并进行diff

df.groupby(level='tests').tests_remaining.diff().mul(-1)
Out[662]: 
date        tests
2018-05-31  test1       NaN
            test2       NaN
2018-06-26  test1    2428.0
            test2     -39.0
2018-06-29  test1      71.0
            test2      -0.0
2018-07-01  test1      -0.0
            test2      64.0
Name: tests_remaining, dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM