设置行索引并查询具有多索引列的 Pandas 数据帧

Question

从开始的pandas数据框用多维列标题结构，如以下，是有办法，我可以改变的Area Names和Area Codes ，使他们跨越各个级别的标题（即这样一个Area Names和Area Codes标签跨越多个列标题行？

如果是这样，我怎么可能再运行在列的查询只返回对应于特定值的行（例如E06000047的区号），或在低和非常高值，为英格兰在2012/13？

我想知道根据区域代码或区域名称或两列行索引['*Area Code*', '*Area Names*']定义行索引是否更容易。 如果是这样，我怎样才能从当前表中做到这一点？ set_index似乎对使用当前结构set_index ？

创建上面的代码片段：

import pandas as pd

df= pd.DataFrame({('2011/12*', 'High', '7-8'): {3: 49.83,
  5: 50.01,
  7: 48.09,
  8: 43.58,
  9: 44.19},
 ('2011/12*', 'Low', '0-4'): {3: 6.51, 5: 6.53, 7: 6.49, 8: 6.41, 9: 6.12},
 ('2011/12*', 'Medium', '5-6'): {3: 17.44,
  5: 17.59,
  7: 18.11,
  8: 19.23,
  9: 20.01},
 ('2011/12*', 'Very High', '9-10'): {3: 26.22,
  5: 25.87,
  7: 27.32,
  8: 30.78,
  9: 29.68},
 ('2012/13*', 'High', '7-8'): {3: 51.16,
  5: 51.35,
  7: 48.47,
  8: 44.67,
  9: 49.39},
 ('2012/13*', 'Low', '0-4'): {3: 5.71, 5: 5.74, 7: 6.73, 8: 8.42, 9: 6.51},
 ('2012/13*', 'Medium', '5-6'): {3: 17.1,
  5: 17.29,
  7: 18.46,
  8: 20.23,
  9: 15.81},
 ('2012/13*', 'Very High', '9-10'): {3: 26.03,
  5: 25.62,
  7: 26.34,
  8: 26.68,
  9: 28.3},
 ('Area Codes', 'Area Codes', 'Area Codes'): {3: 'K02000001',
  5: 'E92000001',
  7: 'E12000001',
  8: 'E06000047',
  9: 'E06000005'},
 ('Area Names', 'Area Names', 'Area Names'): {3: 'UNITED KINGDOM',
  5: 'ENGLAND',
  7: 'NORTH EAST',
  8: 'County Durham',
  9: 'Darlington'}})

Answer 1

我认为你需要set_index与元组由MultiIndex设置：

df.set_index([('Area Codes','Area Codes','Area Codes'),
              ('Area Names','Area Names','Area Names')], inplace=True)
df.index.names = ['Area Codes','Area Names']
print (df)
                          2011/12*                        2012/13*        \
                              High   Low Medium Very High     High   Low   
                               7-8   0-4    5-6      9-10      7-8   0-4   
Area Codes Area Names                                                      
K02000001  UNITED KINGDOM    49.83  6.51  17.44     26.22    51.16  5.71   
E92000001  ENGLAND           50.01  6.53  17.59     25.87    51.35  5.74   
E12000001  NORTH EAST        48.09  6.49  18.11     27.32    48.47  6.73   
E06000047  County Durham     43.58  6.41  19.23     30.78    44.67  8.42   
E06000005  Darlington        44.19  6.12  20.01     29.68    49.39  6.51   


                          Medium Very High  
                             5-6      9-10  
Area Codes Area Names                       
K02000001  UNITED KINGDOM  17.10     26.03  
E92000001  ENGLAND         17.29     25.62  
E12000001  NORTH EAST      18.46     26.34  
E06000047  County Durham   20.23     26.68  
E06000005  Darlington      15.81     28.30

然后需要sort_index ，因为：

KeyError: 'MultiIndex Slicing 要求索引是完全 lexsorted tuple len (2), lexsort depth (0)'

df.sort_index(inplace=True)

最后使用由切片器选择：

idx = pd.IndexSlice
print (df.loc[idx['E06000047',:], :])

                        2011/12*                        2012/13*        \
                             High   Low Medium Very High     High   Low   
                              7-8   0-4    5-6      9-10      7-8   0-4   
Area Codes Area Names                                                     
E06000047  County Durham    43.58  6.41  19.23     30.78    44.67  8.42   


                         Medium Very High  
                            5-6      9-10  
Area Codes Area Names                      
E06000047  County Durham  20.23     26.68

print (df.loc[idx[:,'ENGLAND'], idx['2012/13*',['Low','Very High']]])
                      2012/13*          
                           Low Very High
                           0-4      9-10
Area Codes Area Names                   
E92000001  ENGLAND        5.74     25.62

设置行索引并查询具有多索引列的 Pandas 数据帧

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-09-28 11:09:08

设置行索引并查询具有多索引列的 Pandas 数据帧

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-09-28 11:09:08

解决方案1
1 已采纳 2016-09-28 11:09:08