简体   繁体   English

Pandas - Groupby一个多索引级别,获得可能的组合,然后转换数据

[英]Pandas - Groupby a multiindex level, get the possible combinations, then transform the data

I have been struggling with a problem of grouping by, combinations and transform. 我一直在努力解决分组,组合和转换的问题。 My current solution is: 我目前的解决方案是:

df = df.groupby(level='lvl_2').transform(lambda x: x[0]/x[1])

But this doesn't tackled some parts of my problems. 但这并没有解决我问题的某些部分。

Assuming the code below: 假设代码如下:

import pandas as pd
import numpy as np
import datetime
today = datetime.date.today()
today_1 = datetime.date.today() - datetime.timedelta(1)
today_2 = datetime.date.today() - datetime.timedelta(2)
ticker_date = [('first', 'a',today), ('first', 'a',today_1), ('first', 'a',today_2),
               ('first', 'c',today), ('first', 'c',today_1), ('first', 'c',today_2),
               ('first', 'b',today), ('first', 'b',today_1), ('first', 'b',today_2),
               ('first', 'd',today), ('first', 'd',today_1), ('first', 'd',today_2)]
index_df = pd.MultiIndex.from_tuples(ticker_date,names=['lvl_1','lvl_2','lvl_3'])
df = pd.DataFrame(np.random.rand(12), index_df, ['idx'])

The output is: 输出是:

                          idx
lvl_1 lvl_2 lvl_3               
first a     2018-02-14  0.421075
            2018-02-13  0.278418
            2018-02-12  0.117888
      c     2018-02-14  0.716823
            2018-02-13  0.241261
            2018-02-12  0.772491
      b     2018-02-14  0.681738
            2018-02-13  0.636927
            2018-02-12  0.668964
      d     2018-02-14  0.770797
            2018-02-13  0.11469
            2018-02-12  0.877965

I need the following: 我需要以下内容:

  1. Get a new multiindex dataframe with the possible combinations of lvl_2 elements. 获取具有lvl_2元素的可能组合的新多索引数据框。
  2. Transform my data to get the ratio of each elements 转换我的数据以获得每个元素的比例

Here is an illustration: 这是一个例子:

Here, I've created a 'new' column. 在这里,我创建了一个“新”列。

                                new
lvl_1   lvl_2       lvl_3   
first   a/c     2018-02-14  0.587418372
                2018-02-13  1.154011631
                2018-02-12  0.152607603
        a/b     2018-02-14  0.617649302
                2018-02-13  0.437127018
                2018-02-12  0.17622473
        a/d     2018-02-14  0.546285209
                2018-02-13  2.427569971
                2018-02-12  0.134274145
        c/b     2018-02-14  1.051464052
                2018-02-13  0.378789092
                2018-02-12  1.154757207
        c/d     2018-02-14  0.929976375
                2018-02-13  2.103592292
                2018-02-12  0.87986537
        b/d     2018-02-14  0.884458554
                2018-02-13  5.553465865
                2018-02-12  0.761948369

To further explain: 进一步解释:

                                    new
    lvl_1   lvl_2       lvl_3   
    first   a/c     2018-02-14  0.587418372
                    2018-02-13  1.154011631
                    2018-02-12  0.152607603

Here, I do the ratio of the elements of a with c: 在这里,我使用c的元素的比例:

0.587418 = 0.421075/0.716823
1.154012 = 0.278418/0.241261
0.152608 = 0.117888/0.772491

I have tried a groupby and transform method, something like: 我尝试了一个groupby和transform方法,例如:

df = df.groupby(level='lvl_2').transform(lambda x: x[0]/x[1])

But obviously, this only transform the first and second value of each specific level. 但显然,这只会转换每个特定级别的第一个和第二个值。 Also, I don't know how to establish the new multiindex with the combinations. 另外,我不知道如何用这些组合建立新的多索引。 (a/c, a/b, a/d, c/b, c/d, b/d) (a / c,a / b,a / d,c / b,c / d,b / d)

I feel that I am on the right path, but I feel stuck. 我觉得我走在正确的道路上,但我感到困惑。

If for first level are same combinations of another levels like in sample is possible use reindex to MultiIndex in columns with div : 如果对于第一级别是相同的其他级别的组合,例如在样本中可以使用reindex到具有div列中的MultiIndex

#same as Maarten Fabré answer
np.random.seed(42)

from  itertools import combinations

#get combination of second level values
c = pd.MultiIndex.from_tuples(list(combinations(df.index.levels[1], 2)))

#reshape to unique columns of second level
print (df['idx'].unstack(1))
lvl_2                    a         b         c         d
lvl_1 lvl_3                                             
first 2018-02-12  0.731994  0.601115  0.155995  0.969910
      2018-02-13  0.950714  0.866176  0.156019  0.020584
      2018-02-14  0.374540  0.058084  0.598658  0.708073

#reindex by both levels
df1 = df['idx'].unstack(1).reindex(columns=c, level=0)
print (df1)
                         a                             b                   c
                         b         c         d         c         d         d
lvl_1 lvl_3                                                                 
first 2018-02-12  0.731994  0.731994  0.731994  0.601115  0.601115  0.155995
      2018-02-13  0.950714  0.950714  0.950714  0.866176  0.866176  0.156019
      2018-02-14  0.374540  0.374540  0.374540  0.058084  0.058084  0.598658


df2 = df['idx'].unstack(1).reindex(columns=c, level=1)
print (df2)
                         a                             b                   c
                         b         c         d         c         d         d
lvl_1 lvl_3                                                                 
first 2018-02-12  0.601115  0.155995  0.969910  0.155995  0.969910  0.969910
      2018-02-13  0.866176  0.156019  0.020584  0.156019  0.020584  0.020584
      2018-02-14  0.058084  0.598658  0.708073  0.598658  0.708073  0.708073

#divide with flatten MultiIndex    
df3 = df1.div(df2)
df3.columns = df3.columns.map('/'.join)
#reshape back and change order of levels, sorting indices
df3 = df3.stack().reorder_levels([0,2,1]).sort_index()

print (df3)
lvl_1       lvl_3     
first  a/b  2018-02-12     1.217727
            2018-02-13     1.097599
            2018-02-14     6.448292
       a/c  2018-02-12     4.692434
            2018-02-13     6.093594
            2018-02-14     0.625632
       a/d  2018-02-12     0.754703
            2018-02-13    46.185944
            2018-02-14     0.528957
       b/c  2018-02-12     3.853437
            2018-02-13     5.551748
            2018-02-14     0.097023
       b/d  2018-02-12     0.619764
            2018-02-13    42.079059
            2018-02-14     0.082031
       c/d  2018-02-12     0.160834
            2018-02-13     7.579425
            2018-02-14     0.845476
dtype: float64
from itertools import combinations
def calc_ratios(data):
    comb = combinations(data.index.get_level_values('lvl_2').unique(), 2)

    ratios = {
        f'{i}/{j}': 
            data.xs(i, level='lvl_2') / 
            data.xs(j, level='lvl_2')
        for i, j in comb
    }
#     print(ratios)
    if ratios:
        return pd.concat(ratios)
result = pd.concat(calc_ratios(data) for group, data in df.groupby('lvl_1'))
  lvl_1 lvl_3 idx a/b first 2018-02-14 6.448292467809392 a/b first 2018-02-13 1.0975992712883451 a/b first 2018-02-12 1.2177269366284045 a/c first 2018-02-14 0.6256323575698127 a/c first 2018-02-13 6.093594353302192 a/c first 2018-02-12 4.692433684425558 a/d first 2018-02-14 0.5289572433565499 a/d first 2018-02-13 46.185944271838835 a/d first 2018-02-12 0.7547030687230791 b/d first 2018-02-14 0.08203059119870332 b/d first 2018-02-13 42.07905879677424 b/d first 2018-02-12 0.6197637959891664 c/b first 2018-02-14 10.306839775450461 c/b first 2018-02-13 0.18012345549282302 c/b first 2018-02-12 0.25950860865015657 c/d first 2018-02-14 0.8454761601705119 c/d first 2018-02-13 7.579425474360648 c/d first 2018-02-12 0.16083404038888807 

(data generated with np.random.seed(42) ) (使用np.random.seed(42)生成的数据)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM