[英]python pandas: using pd.IndexSlice for both rows and columns in a double multiindex dataframe
[英]Pandas dataframe - multiindex for both rows and columns?
想象一下这是我的输入数据:
data = [("France", "Paris", "Male", "1"),
("France", "Paris", "Female", "6"),
("France", "Nice", "Male", "2"),
("France", "Nice", "Female", "7"),
("Germany", "Berlin", "Male", "3"),
("Germany", "Berlin", "Female", "8"),
("Germany", "Munchen", "Male", "4"),
("Germany", "Munchen", "Female", "9"),
("Germany", "Koln", "Male", "5"),
("Germany", "Koln", "Female", "10")]
我想把它放到像这样的数据帧中:
Country City Sex
Male Female
France Paris 1 6
Nice 2 7
Germany Berlin 3 8
Munchen 4 9
Koln 5 10
第一部分很简单:
df = pd.DataFrame(data, columns=["country", "city", "sex", "count"])
df = df.set_index(["country", "city"])
给我输出:
sex count
country city
France Paris Male 1
Paris Female 6
Nice Male 2
Nice Female 7
Germany Berlin Male 3
Berlin Female 8
Munchen Male 4
Munchen Female 9
Koln Male 5
Koln Female 10
因此行是可以的,但现在我想将'sex'列中的值放入列多索引中。 有可能这样做,如果是这样,怎么样?
添加列Sex
来list
在set_index
并调用unstack
:
df = df.set_index(["country", "city",'sex']).unstack()
#data cleaning - remove columns name sex and rename column count
df = df.rename_axis((None, None),axis=1).rename(columns={'count':'Sex'})
print (df)
Sex
Female Male
country city
France Nice 7 2
Paris 6 1
Germany Berlin 8 3
Koln 10 5
Munchen 9 4
使用枢轴取代堆叠的另一种方法(两者几乎意味着相同)即
df.set_index(['country','city']).pivot(columns='sex')
count sex Female Male country city France Nice 7 2 Paris 6 1 Germany Berlin 8 3 Koln 10 5 Munchen 9 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.