繁体   English   中英

设置行索引并查询具有多索引列的 Pandas 数据帧

[英]Setting a row index on and querying a pandas dataframe with multi-index columns

从开始的pandas数据框用多维列标题结构,如以下,是有办法,我可以改变的Area NamesArea Codes ,使他们跨越各个级别的标题(即这样一个Area NamesArea Codes标签跨越多个列标题行?

多维列标题表

如果是这样,我怎么可能再运行在列的查询只返回对应于特定值的行(例如E06000047区号),或在非常高值,为英格兰2012/13?

我想知道根据区域代码区域名称或两列行索引['*Area Code*', '*Area Names*']定义行索引是否更容易。 如果是这样,我怎样才能从当前表中做到这一点? set_index似乎对使用当前结构set_index

创建上面的代码片段:

import pandas as pd

df= pd.DataFrame({('2011/12*', 'High', '7-8'): {3: 49.83,
  5: 50.01,
  7: 48.09,
  8: 43.58,
  9: 44.19},
 ('2011/12*', 'Low', '0-4'): {3: 6.51, 5: 6.53, 7: 6.49, 8: 6.41, 9: 6.12},
 ('2011/12*', 'Medium', '5-6'): {3: 17.44,
  5: 17.59,
  7: 18.11,
  8: 19.23,
  9: 20.01},
 ('2011/12*', 'Very High', '9-10'): {3: 26.22,
  5: 25.87,
  7: 27.32,
  8: 30.78,
  9: 29.68},
 ('2012/13*', 'High', '7-8'): {3: 51.16,
  5: 51.35,
  7: 48.47,
  8: 44.67,
  9: 49.39},
 ('2012/13*', 'Low', '0-4'): {3: 5.71, 5: 5.74, 7: 6.73, 8: 8.42, 9: 6.51},
 ('2012/13*', 'Medium', '5-6'): {3: 17.1,
  5: 17.29,
  7: 18.46,
  8: 20.23,
  9: 15.81},
 ('2012/13*', 'Very High', '9-10'): {3: 26.03,
  5: 25.62,
  7: 26.34,
  8: 26.68,
  9: 28.3},
 ('Area Codes', 'Area Codes', 'Area Codes'): {3: 'K02000001',
  5: 'E92000001',
  7: 'E12000001',
  8: 'E06000047',
  9: 'E06000005'},
 ('Area Names', 'Area Names', 'Area Names'): {3: 'UNITED KINGDOM',
  5: 'ENGLAND',
  7: 'NORTH EAST',
  8: 'County Durham',
  9: 'Darlington'}})

我认为你需要set_index与元组由MultiIndex设置:

df.set_index([('Area Codes','Area Codes','Area Codes'),
              ('Area Names','Area Names','Area Names')], inplace=True)
df.index.names = ['Area Codes','Area Names']
print (df)
                          2011/12*                        2012/13*        \
                              High   Low Medium Very High     High   Low   
                               7-8   0-4    5-6      9-10      7-8   0-4   
Area Codes Area Names                                                      
K02000001  UNITED KINGDOM    49.83  6.51  17.44     26.22    51.16  5.71   
E92000001  ENGLAND           50.01  6.53  17.59     25.87    51.35  5.74   
E12000001  NORTH EAST        48.09  6.49  18.11     27.32    48.47  6.73   
E06000047  County Durham     43.58  6.41  19.23     30.78    44.67  8.42   
E06000005  Darlington        44.19  6.12  20.01     29.68    49.39  6.51   


                          Medium Very High  
                             5-6      9-10  
Area Codes Area Names                       
K02000001  UNITED KINGDOM  17.10     26.03  
E92000001  ENGLAND         17.29     25.62  
E12000001  NORTH EAST      18.46     26.34  
E06000047  County Durham   20.23     26.68  
E06000005  Darlington      15.81     28.30 

然后需要sort_index ,因为:

KeyError: 'MultiIndex Slicing 要求索引是完全 lexsorted tuple len (2), lexsort depth (0)'

df.sort_index(inplace=True)

最后使用由切片器选择:

idx = pd.IndexSlice
print (df.loc[idx['E06000047',:], :])

                        2011/12*                        2012/13*        \
                             High   Low Medium Very High     High   Low   
                              7-8   0-4    5-6      9-10      7-8   0-4   
Area Codes Area Names                                                     
E06000047  County Durham    43.58  6.41  19.23     30.78    44.67  8.42   


                         Medium Very High  
                            5-6      9-10  
Area Codes Area Names                      
E06000047  County Durham  20.23     26.68  

print (df.loc[idx[:,'ENGLAND'], idx['2012/13*',['Low','Very High']]])
                      2012/13*          
                           Low Very High
                           0-4      9-10
Area Codes Area Names                   
E92000001  ENGLAND        5.74     25.62

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM