枢轴后重新排序多索引的熊猫数据框数据

Question

I'm building an analysis tool for public transportation data and want to reorder data in a pandas dataframe that can be best explained using the following example: 我正在构建用于公共交通数据的分析工具，并希望对熊猫数据框中的数据进行重新排序，可以使用以下示例对其进行最好的解释：

My initial shape of data is: 我最初的数据形状是：

            Population                                GDP per capita
date        2015          2016          2017          2015            2016            2017
country                        
France      66593366.0    66859768.0    67118648.0    40564.460707    41357.986933    42850.386280
Germany     81686611.0    82348669.0    82695000.0    47810.836011    48943.101805    50638.890964
Italy       60730582.0    60627498.0    60551416.0    36640.115578    38380.172412    39426.940797
Spain       46444832.0    46484062.0    46572028.0    34818.120507    36305.222132    37997.852337

I wan't to reshape the dataframe so that the dates are the toplevel index and the currently lower level information Population and GDP per capita is on the lower level. 我不想重塑数据框，以使日期成为GDP per capita级别的索引，而当前较低级别的信息Population和GDP per capita处于较低级别。 The resulting dataframe should look as follows: 结果数据帧应如下所示：

            2015                            2016                            2017
date        Population    GDP per capita    Population    GDP per capita    Population    GDP per capita
country
France      66593366.0    40564.460707      66859768.0    41357.986933      67118648.0    42850.386280
Germany     81686611.0    47810.836011      82348669.0    48943.101805      82695000.0    50638.890964
Italy       60730582.0    36640.115578      60627498.0    38380.172412      60551416.0    39426.940797
Spain       46444832.0    34818.120507      46484062.0    36305.222132      46572028.0    37997.852337

How can I achieve this using pandas? 如何使用熊猫来实现？ I've been experimenting with swaplevel but was not able to get the expected results. 我一直在尝试使用swaplevel但是无法获得预期的结果。

The dataframe is obtained from the following data with a pivot operation: 该数据帧是通过以下数据pivot操作获得的：

       country    date    Population    GDP per capita    GNI per capita

1      Germany    2017    82695000.0    50638.890964    51680.0
2      Germany    2016    82348669.0    48943.101805    49770.0
3      Germany    2015    81686611.0    47810.836011    48690.0
60     Spain      2017    46572028.0    37997.852337    37990.0
61     Spain      2016    46484062.0    36305.222132    36300.0
62     Spain      2015    46444832.0    34818.120507    34740.0
119    France     2017    67118648.0    42850.386280    43790.0
120    France     2016    66859768.0    41357.986933    42020.0
121    France     2015    66593366.0    40564.460707    41100.0
237    Italy      2017    60551416.0    39426.940797    39640.0
238    Italy      2016    60627498.0    38380.172412    38470.0
239    Italy      2015    60730582.0    36640.115578    36440.0

And the following pivot : 和以下pivot ：

df_p = df_small.pivot(
    index='country', 
    columns='date', 
    values=['Population', 'GDP per capita'])

Answer 1

Swap levels and sort_index, 交换级别和sort_index，

df_p.columns = df_p.columns.swaplevel(1,0)
df_p = df_p.sort_index(axis = 1)


date    2015                        2016                        2017
        GDP per capita  Population  GDP per capita  Population  GDP per capita  Population
country                     
France  40564.460707    66593366.0  41357.986933    66859768.0  42850.386280    67118648.0
Germany 47810.836011    81686611.0  48943.101805    82348669.0  50638.890964    82695000.0
Italy   36640.115578    60730582.0  38380.172412    60627498.0  39426.940797    60551416.0
Spain   34818.120507    46444832.0  36305.222132    46484062.0  37997.852337    46572028.0

Answer 2

At a broad level, you want to do something like this: 从广义上讲，您想要执行以下操作：

df.pivot(index='country', columns='date', values=['GDP per capita' , 'Population']) \
    .reorder_levels(['date', None], axis=1) \  # the multiindex doesn't get a name, so None
    .sort_index(level=[0, 1], axis=1, ascending=[True, False])

First, you do the pivot. 首先，您要做关键。 Then, reorder the levels to put the date at the top. 然后，重新排列级别以将日期放在顶部。 That creates something that isn't quite right though, where the MultiIndex then provides an entry for every single element. 但这会产生不完全正确的结果，然后MultiIndex为每个单个元素提供一个条目。

So second, sort the columns index by its levels to group them. 因此，第二，按列索引的级别对它们进行分组。 And you end up with this: 最终，您将得到：

date           2015                       2016                       2017               
         Population GDP per capita  Population GDP per capita  Population GDP per capita
country                                                                                 
France   66593366.0   40564.460707  66859768.0   41357.986933  67118648.0   42850.386280
Germany  81686611.0   47810.836011  82348669.0   48943.101805  82695000.0   50638.890964
Italy    60730582.0   36640.115578  60627498.0   38380.172412  60551416.0   39426.940797
Spain    46444832.0   34818.120507  46484062.0   36305.222132  46572028.0   37997.852337

Also, it'd be great to find a way to easily read in your data instead of having to gerrymander out a system using pd.read_csv(string_io_obj, sep='\\s\\s+') but that's just a minor quibble. 同样，找到一种轻松读取数据的方法也很棒，而不必使用pd.read_csv(string_io_obj, sep='\\s\\s+')但这只是一个小小的错误。

By passing explicit sorting instructions for both levels, you can also make level=1 for the columns have reverse order to get Population before per cap GDP. 通过为两个级别传递明确的排序指令，您还可以使level=1 ，因为列具有相反的顺序，以便在人均GDP之前获得人口。 That might not work in other cases where someone may want explicit ordering that is not coincidentally alphabetic (or the reverse thereof). 在其他情况下，如果有人想要显式排序而不是巧合的字母（或相反的字母），则可能不起作用。

枢轴后重新排序多索引的熊猫数据框数据

问题描述

2 个解决方案

解决方案1
2 2019-04-10 20:31:26

解决方案2
1 2019-04-10 20:35:36

枢轴后重新排序多索引的熊猫数据框数据

问题描述

2 个解决方案

解决方案1 2 2019-04-10 20:31:26

解决方案2 1 2019-04-10 20:35:36

解决方案1
2 2019-04-10 20:31:26

解决方案2
1 2019-04-10 20:35:36