简体   繁体   中英

Pandas: Divide MultiIndex data frame by row

I have a data frame with a multi index (panel), and I would like to divide for each group ( county<\/code> ) and each row, the values by a specific year.

>>> fields
Out[39]: ['emplvl', 'population', 'estab', 'estab_pop', 'emp_pop']
>>> df[fields]
Out[40]: 
                   emplvl  population    estab  estab_pop   emp_pop
county year                                                        
1001   2003  11134.500000       46800   801.75   0.017131  0.237917
       2004  11209.166667       48366   824.00   0.017037  0.231757
       2005  11452.166667       49676   870.75   0.017529  0.230537
       2006  11259.250000       51328   862.50   0.016804  0.219359
       2007  11403.333333       52405   879.25   0.016778  0.217600
       2008  11272.833333       53277   890.25   0.016710  0.211589
       2009  11003.833333       54135   877.00   0.016200  0.203267
       2010  10693.916667       54632   877.00   0.016053  0.195745
       2011  10627.000000         NaN   862.00        NaN       NaN
       2012  10136.916667         NaN   841.75        NaN       NaN
1003   2003  51372.250000      151509  4272.00   0.028196  0.339071
       2004  53450.583333      156266  4536.25   0.029029  0.342049
       2005  56110.250000      162183  4880.50   0.030093  0.345969
       2006  59291.000000      168121  5067.50   0.030142  0.352669
       2007  62600.083333      172404  5337.25   0.030958  0.363101
       2008  62611.500000      175827  5529.25   0.031447  0.356097
       2009  58947.666667      179406  5273.75   0.029396  0.328571
       2010  58139.583333      183195  5171.25   0.028228  0.317364
       2011  59581.000000         NaN  5157.75        NaN       NaN
       2012  60440.250000         NaN  5171.75        NaN       NaN

I think you can reset_index with df1 and then use div :

fields = ['emplvl', 'population', 'estab', 'estab_pop', 'emp_pop'] 

df1 =  df.loc[df.index.get_level_values('year') == 2007, fields].reset_index(level=1)
print df1
        year        emplvl  population    estab  estab_pop   emp_pop
county                                                              
1001    2007  11403.333333     52405.0   879.25   0.016778  0.217600
1003    2007  62600.083333    172404.0  5337.25   0.030958  0.363101

print df.div(df1[fields], axis=0)
               emplvl  population     estab  estab_pop   emp_pop
county year                                                     
1001   2003  0.976425    0.893045  0.911857   1.021039  1.093369
       2004  0.982973    0.922927  0.937162   1.015437  1.065060
       2005  1.004282    0.947925  0.990333   1.044761  1.059453
       2006  0.987365    0.979449  0.980950   1.001550  1.008084
       2007  1.000000    1.000000  1.000000   1.000000  1.000000
       2008  0.988556    1.016640  1.012511   0.995947  0.972376
       2009  0.964966    1.033012  0.997441   0.965550  0.934131
       2010  0.937789    1.042496  0.997441   0.956789  0.899563
       2011  0.931920         NaN  0.980381        NaN       NaN
       2012  0.888943         NaN  0.957350        NaN       NaN
1003   2003  0.820642    0.878802  0.800412   0.910782  0.933820
       2004  0.853842    0.906394  0.849923   0.937690  0.942022
       2005  0.896329    0.940715  0.914422   0.972059  0.952818
       2006  0.947139    0.975157  0.949459   0.973642  0.971270
       2007  1.000000    1.000000  1.000000   1.000000  1.000000
       2008  1.000182    1.019855  1.035974   1.015796  0.980711
       2009  0.941655    1.040614  0.988102   0.949545  0.904902
       2010  0.928746    1.062591  0.968898   0.911816  0.874038
       2011  0.951772         NaN  0.966368        NaN       NaN
       2012  0.965498         NaN  0.968992        NaN       NaN

At first I would suggest you to set a unique dataframe for your operation. Let's assume its name is df<\/code> .

This is the row with 2007 as year and individual county names.

The index of row to divide by has been selected as reference_index<\/code> which includes the name of county and the year.

At the end, the row is divided by itself to get 1 value.

for index in df.index:
    for column in df.columns:
        
        county = index[0]
        
        #index of reference row to divide rest of the rows by 
        reference_index = (county, 2007)
        
        if index!=reference_index:
            df.loc[index, column] = df.loc[index, column] / df.loc[reference_index, column]
            
    #The row with 2007 year should also be divided by itself, but at the end. otherwise, it becomes 1 beforehand.
    df.loc[reference_index] = df.loc[reference_index] / df.loc[reference_index]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM