[英]Setting a row index on and querying a pandas dataframe with multi-index columns
从开始的pandas
数据框用多维列标题结构,如以下,是有办法,我可以改变的Area Names
和Area Codes
,使他们跨越各个级别的标题(即这样一个Area Names
和Area Codes
标签跨越多个列标题行?
如果是这样,我怎么可能再运行在列的查询只返回对应于特定值的行(例如E06000047的区号),或在低和非常高值,为英格兰在2012/13?
我想知道根据区域代码或区域名称或两列行索引['*Area Code*', '*Area Names*']
定义行索引是否更容易。 如果是这样,我怎样才能从当前表中做到这一点? set_index
似乎对使用当前结构set_index
?
创建上面的代码片段:
import pandas as pd
df= pd.DataFrame({('2011/12*', 'High', '7-8'): {3: 49.83,
5: 50.01,
7: 48.09,
8: 43.58,
9: 44.19},
('2011/12*', 'Low', '0-4'): {3: 6.51, 5: 6.53, 7: 6.49, 8: 6.41, 9: 6.12},
('2011/12*', 'Medium', '5-6'): {3: 17.44,
5: 17.59,
7: 18.11,
8: 19.23,
9: 20.01},
('2011/12*', 'Very High', '9-10'): {3: 26.22,
5: 25.87,
7: 27.32,
8: 30.78,
9: 29.68},
('2012/13*', 'High', '7-8'): {3: 51.16,
5: 51.35,
7: 48.47,
8: 44.67,
9: 49.39},
('2012/13*', 'Low', '0-4'): {3: 5.71, 5: 5.74, 7: 6.73, 8: 8.42, 9: 6.51},
('2012/13*', 'Medium', '5-6'): {3: 17.1,
5: 17.29,
7: 18.46,
8: 20.23,
9: 15.81},
('2012/13*', 'Very High', '9-10'): {3: 26.03,
5: 25.62,
7: 26.34,
8: 26.68,
9: 28.3},
('Area Codes', 'Area Codes', 'Area Codes'): {3: 'K02000001',
5: 'E92000001',
7: 'E12000001',
8: 'E06000047',
9: 'E06000005'},
('Area Names', 'Area Names', 'Area Names'): {3: 'UNITED KINGDOM',
5: 'ENGLAND',
7: 'NORTH EAST',
8: 'County Durham',
9: 'Darlington'}})
我认为你需要set_index
与元组由MultiIndex
设置:
df.set_index([('Area Codes','Area Codes','Area Codes'),
('Area Names','Area Names','Area Names')], inplace=True)
df.index.names = ['Area Codes','Area Names']
print (df)
2011/12* 2012/13* \
High Low Medium Very High High Low
7-8 0-4 5-6 9-10 7-8 0-4
Area Codes Area Names
K02000001 UNITED KINGDOM 49.83 6.51 17.44 26.22 51.16 5.71
E92000001 ENGLAND 50.01 6.53 17.59 25.87 51.35 5.74
E12000001 NORTH EAST 48.09 6.49 18.11 27.32 48.47 6.73
E06000047 County Durham 43.58 6.41 19.23 30.78 44.67 8.42
E06000005 Darlington 44.19 6.12 20.01 29.68 49.39 6.51
Medium Very High
5-6 9-10
Area Codes Area Names
K02000001 UNITED KINGDOM 17.10 26.03
E92000001 ENGLAND 17.29 25.62
E12000001 NORTH EAST 18.46 26.34
E06000047 County Durham 20.23 26.68
E06000005 Darlington 15.81 28.30
然后需要sort_index
,因为:
KeyError: 'MultiIndex Slicing 要求索引是完全 lexsorted tuple len (2), lexsort depth (0)'
df.sort_index(inplace=True)
最后使用由切片器选择:
idx = pd.IndexSlice
print (df.loc[idx['E06000047',:], :])
2011/12* 2012/13* \
High Low Medium Very High High Low
7-8 0-4 5-6 9-10 7-8 0-4
Area Codes Area Names
E06000047 County Durham 43.58 6.41 19.23 30.78 44.67 8.42
Medium Very High
5-6 9-10
Area Codes Area Names
E06000047 County Durham 20.23 26.68
print (df.loc[idx[:,'ENGLAND'], idx['2012/13*',['Low','Very High']]])
2012/13*
Low Very High
0-4 9-10
Area Codes Area Names
E92000001 ENGLAND 5.74 25.62
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.