简体   繁体   English

在pandas MultiIndex数据帧中旋转数据

[英]rotating data in pandas MultiIndex dataframe

I have a MultiIndex dataframe looks like, this is only partial. 我有一个MultiIndex数据帧看起来像,这只是局部的。 The Year range frome 2007 to 2015 with the same places for each year. 2007年至2015年的年度范围与每年相同的地方。

                Jan   Feb   Mar   Apr   May  June  July   Aug  Sept   Oct  \
Year Place                                                                     
2007 Johore       1.26  1.07  1.21  1.27  1.33  1.28  1.67  1.88  1.89  1.86   
     Kedah        1.20  1.27  1.50  1.38  1.38  1.52  1.84  2.09  2.08  2.02   
     Kelantan     0.92  0.90  1.01  1.10  1.07  0.87  0.93  1.02  1.08  1.17   
     Malacca      1.62  1.45  1.64  1.52  1.50  1.40  1.75  1.80  2.03  2.14   
     N. Sembilan  0.98  0.94  1.11  1.07  1.10  1.16  1.46  1.58  1.61  1.71   

                   Nov   Dec  
Year Place                    
2007 Johore       1.95  1.72  
     Kedah        1.79  1.39  
     Kelantan     1.29  0.97  
     Malacca      2.44  2.13  
     N. Sembilan  1.75  1.58  

I want to rotate the data and get a single index dataframe with the index being months (eg 2007-Jan,2007-Feb ) and the columns being different places. 我想旋转数据并获得索引为月的单个索引数据帧(例如2007年1月,2007年2月),列是不同的位置。

I tried 'Pahang' as an example and did: 我试过“彭亨”作为例子做了:

In [14]:

Pahang=df.xs('Pahang',level='Place')
In [15]:

Pahang.unstack().unstack().unstack()
Out[15]:
Year      
2007  Jan     1.19
      Feb     1.01
      Mar     1.13
      Apr     1.19
      May     1.24
      June    1.17
      July    1.43
      Aug     1.59
      Sept    1.63
      Oct     1.64
      Nov     1.82
      Dec     1.31
2008  Jan     1.57
      Feb     1.36
      Mar     1.56
...
2014  Oct     1.87
      Nov     1.74
      Dec     1.09
2015  Jan     0.93
      Feb     1.02
      Mar     1.28
      Apr     1.51
      May      NaN
      June     NaN
      July     NaN
      Aug      NaN
      Sept     NaN
      Oct      NaN
      Nov      NaN
      Dec      NaN
Length: 108, dtype: float64

I get the Pahang column as I want. 我按照自己的意愿获得了彭亨专栏。 Instead of doing it one place at a time, I am wondering if there is a way to loop through all places in a faster way. 我想知道是否有办法以更快的方式遍历所有地方,而不是一次只做一个地方。 Thanks! 谢谢!

You can do the reshaping for all Places and then select just one of them. 您可以对所有Places进行重塑,然后只选择其中一个。

import pandas as pd
import numpy as np

# your data
# ===================================
multi_index = pd.MultiIndex.from_product([np.arange(2007,2016,1), 'A B C D E'.split()], names=['Year', 'Place'])
df = pd.DataFrame( np.random.randn(45,12), columns='Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec'.split(), index=multi_index)

df


               Jan     Feb     Mar   ...       Oct     Nov     Dec
Year Place                           ...                          
2007 A     -0.1512  0.7274 -0.3218   ...    1.2547 -1.8408  1.2585
     B      0.0856 -1.0458 -1.1428   ...    1.0194  1.1958  0.4905
     C     -1.2021 -0.6989 -0.0486   ...   -0.8053 -0.4929  1.6475
     D     -1.9948 -0.3465  1.3036   ...   -0.2490  0.6285 -0.0568
     E      0.0928 -1.3905  0.7203   ...   -0.1138  2.9552 -0.0272
2008 A     -1.2595  1.3072  0.6121   ...   -1.4275  0.8769  2.0671
     B      0.3611 -0.4187 -2.9609   ...   -1.2944  1.2752 -0.0947
     C      1.6492  0.0340 -0.9743   ...    0.0550  1.4135  0.8862
     D      0.9034 -0.2957  0.2152   ...    1.0947 -0.2405  0.0367
     E      0.9566  1.1927  0.0852   ...    0.7396  0.8240 -1.6628
...            ...     ...     ...   ...       ...     ...     ...
2014 A      0.7478 -0.8905  0.6238   ...   -1.0907 -0.2919  0.3261
     B      3.6764 -0.0601  1.2751   ...    0.3294 -1.3375 -1.5087
     C      2.3460 -0.4181  0.0607   ...   -0.8270  0.0536 -0.4353
     D      0.9733 -0.6863  0.5278   ...   -1.8206  0.4788  1.1438
     E     -0.3514  2.4570 -0.8567   ...    1.3434 -1.5634 -0.9984
2015 A      1.2849 -1.0657 -0.1173   ...   -0.1733  0.0441  0.0922
     B      0.5802 -0.5912  1.1193   ...   -0.1296 -0.6374 -1.7727
     C     -0.5026 -1.3111 -0.5499   ...    0.7308  1.2570  0.8733
     D     -1.6482 -0.2213  0.3336   ...   -1.3141 -2.0377 -1.1468
     E     -2.0796 -0.2808 -1.4079   ...   -0.3052  0.7999  0.3516

[45 rows x 12 columns]

# processing
# ==================================
res = df.stack().unstack(level='Place')

Place           A       B       C       D       E
Year                                             
2007 Jan  -0.1512  0.0856 -1.2021 -1.9948  0.0928
     Feb   0.7274 -1.0458 -0.6989 -0.3465 -1.3905
     Mar  -0.3218 -1.1428 -0.0486  1.3036  0.7203
     Apr  -1.4641  2.0384  0.6518  0.8756 -1.4627
     May  -0.8896 -1.6627  0.6990  0.2008  0.7423
     June -0.5339 -0.6629  0.1121  0.3618  1.3838
     July -0.4851  0.6544  0.5251  0.3394 -0.7016
     Aug  -1.2445  0.9671 -1.0684 -0.4776 -0.2936
     Sept  1.1330 -0.7543  1.6029  0.5543  0.3234
     Oct   1.2547  1.0194 -0.8053 -0.2490 -0.1138
...           ...     ...     ...     ...     ...
2015 Mar  -0.1173  1.1193 -0.5499  0.3336 -1.4079
     Apr  -1.0528  0.2421  0.3419 -2.1137 -0.2836
     May  -1.0709 -0.1794 -0.2682 -0.3226  0.8654
     June -1.4538 -0.7313  0.3177 -1.4008  1.1357
     July -1.6210 -0.3815 -0.9876  0.1019  1.7450
     Aug   0.5692  0.7679  1.1893 -0.9612  0.0903
     Sept  0.2371  0.6740  0.9204 -0.2909 -0.8197
     Oct  -0.1733 -0.1296  0.7308 -1.3141 -0.3052
     Nov   0.0441 -0.6374  1.2570 -2.0377  0.7999
     Dec   0.0922 -1.7727  0.8733 -1.1468  0.3516

[108 rows x 5 columns]


# select one place
res['A']

Year      
2007  Jan    -0.1512
      Feb     0.7274
      Mar    -0.3218
      Apr    -1.4641
      May    -0.8896
      June   -0.5339
      July   -0.4851
      Aug    -1.2445
      Sept    1.1330
      Oct     1.2547
               ...  
2015  Mar    -0.1173
      Apr    -1.0528
      May    -1.0709
      June   -1.4538
      July   -1.6210
      Aug     0.5692
      Sept    0.2371
      Oct    -0.1733
      Nov     0.0441
      Dec     0.0922
Name: A, dtype: float64

Instead of using unstack to swap index levels, you could use swaplevel : 而不是使用unstack掉指数的水平,你可以使用swaplevel

In [157]: df.xs('Johore',level='Place').unstack().swaplevel(0,1)
Out[157]: 
Year      
2007  Jan     1.26
      Feb     1.07
      Mar     1.21
      Apr     1.27
      May     1.33
      June    1.28
      July    1.67
      Aug     1.88
      Sept    1.89
      Oct     1.86
dtype: float64

There is also the reorder_levels method which can be useful if your MultiIndex has many levels and need to perform more than one swap. 还有reorder_levels方法,如果您的MultiIndex具有多个级别并且需要执行多个交换,则该方法非常有用。

I was thinking roughly along the same lines as @HappyLeapSecond, but will add this because it's not quite the same and is a little more general (for all rows rather than just a particular one). 我大致沿着和@HappyLeapSecond一样思考,但是会添加它,因为它不太相同而且更加通用(对于所有行而不仅仅是特定行)。

First off, I'm going to use a slightly different sample dataset. 首先,我将使用略有不同的样本数据集。 Note also that I'm posting without the multi-index because a single level index is easier to copy and paste into pandas. 另请注意,我在没有多索引的情况下发布,因为单级索引更容易复制并粘贴到pandas中。

   year     place   Jan   Feb   Mar   Apr   May  June  July   Aug
0  2007    Johore  1.26  1.07  1.21  1.27  1.33  1.28  1.67  1.88
1  2007     Kedah  1.20  1.27  1.50  1.38  1.38  1.52  1.84  2.09
2  2007  Kelantan  0.92  0.90  1.01  1.10  1.07  0.87  0.93  1.02
3  2007   Malacca  1.62  1.45  1.64  1.52  1.50  1.40  1.75  1.80
4  2008    Johore  1.26  1.07  1.21  1.27  1.33  1.28  1.67  1.88
5  2008     Kedah  1.20  1.27  1.50  1.38  1.38  1.52  1.84  2.09
6  2008  Kelantan  0.92  0.90  1.01  1.10  1.07  0.87  0.93  1.02
7  2008   Malacca  1.62  1.45  1.64  1.52  1.50  1.40  1.75  1.80

Then, set the index so it's comparable to the one in the question: 然后,设置索引,使其与问题中的索引相当:

df = df.reset_index(drop=True).set_index(['year','place'])

                Jan   Feb   Mar   Apr   May  June  July   Aug
year place                                                   
2007 Johore    1.26  1.07  1.21  1.27  1.33  1.28  1.67  1.88
     Kedah     1.20  1.27  1.50  1.38  1.38  1.52  1.84  2.09
     Kelantan  0.92  0.90  1.01  1.10  1.07  0.87  0.93  1.02
     Malacca   1.62  1.45  1.64  1.52  1.50  1.40  1.75  1.80
2008 Johore    1.26  1.07  1.21  1.27  1.33  1.28  1.67  1.88
     Kedah     1.20  1.27  1.50  1.38  1.38  1.52  1.84  2.09
     Kelantan  0.92  0.90  1.01  1.10  1.07  0.87  0.93  1.02
     Malacca   1.62  1.45  1.64  1.52  1.50  1.40  1.75  1.80

And then some switching around, resorting, etc. The main "problem" you have in your data is that you start with the year in the row axis but the month in the column axis. 然后是一些切换,求助等。数据中的主要“问题”是,您从行轴中的年份开始,但是在列轴中的月份。 So what you need to do is move the year indexing from the row to the column. 因此,您需要做的是将年份索引从行移动到列。 That's done with unstack(level='year') . 这是在unstack(level='year') The rest of it is basically just a matter of cleaning up. 其余部分基本上只是清理问题。

df.unstack(level='year').swaplevel(0,1,axis=1).T.sortlevel(0)

place      Johore  Kedah  Kelantan  Malacca
year                                       
2007 Jan     1.26   1.20      0.92     1.62
     Feb     1.07   1.27      0.90     1.45
     Mar     1.21   1.50      1.01     1.64
     Apr     1.27   1.38      1.10     1.52
     May     1.33   1.38      1.07     1.50
     June    1.28   1.52      0.87     1.40
     July    1.67   1.84      0.93     1.75
     Aug     1.88   2.09      1.02     1.80
2008 Jan     1.26   1.20      0.92     1.62
     Feb     1.07   1.27      0.90     1.45
     Mar     1.21   1.50      1.01     1.64
     Apr     1.27   1.38      1.10     1.52
     May     1.33   1.38      1.07     1.50
     June    1.28   1.52      0.87     1.40
     July    1.67   1.84      0.93     1.75
     Aug     1.88   2.09      1.02     1.80

Edit to add: This last line can be simplified by using @JianxunLi's solution. 编辑添加:使用@KianxunLi的解决方案可以简化最后一行。

df.stack().unstack(level='place')

That's a better way of solving the problem of getting year/month on the same index and place on the opposite index, but I'll leave this answer here for now in the event that seeing an alternate method and explanation is helpful. 这是一个更好的方法来解决在同一个索引上获得年/月的问题并放在相反的索引上,但是我现在将这个答案留在这里,因为看到替代方法和解释是有帮助的。

You should use pandas.pivot . 你应该使用pandas.pivot An example is here: Pandas transposition inside dataframe 这里有一个例子: 数据帧内的Pandas转置

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM