[英]MultiIndex Group By in Pandas Data Frame
I have a data set that contains countries and statistics on economic indicators by year, organized like so: 我有一个数据集,其中包含按年份划分的国家和经济指标统计数据,如下所示:
Country Metric 2011 2012 2013 2014
USA GDP 7 4 0 2
USA Pop. 2 3 0 3
GB GDP 8 7 0 7
GB Pop. 2 6 0 0
FR GDP 5 0 0 1
FR Pop. 1 1 0 5
How can I use MultiIndex in pandas to create a data frame that only shows GDP by Year for each country? 如何在pandas中使用MultiIndex创建一个数据框,每个国家只显示年度GDP?
I tried: 我试过了:
df = data.groupby(['Country', 'Metric'])
but it didn't work properly. 但它没有正常工作。
In this case, you don't actually need a groupby
. 在这种情况下,您实际上不需要
groupby
。 You also don't have a MultiIndex
. 您也没有
MultiIndex
。 You can make one like this: 你可以这样做一个:
import pandas
from io import StringIO
datastring = StringIO("""\
Country Metric 2011 2012 2013 2014
USA GDP 7 4 0 2
USA Pop. 2 3 0 3
GB GDP 8 7 0 7
GB Pop. 2 6 0 0
FR GDP 5 0 0 1
FR Pop. 1 1 0 5
""")
data = pandas.read_table(datastring, sep='\s\s+')
data.set_index(['Country', 'Metric'], inplace=True)
Then data
looks like this: 那么
data
看起来像这样:
2011 2012 2013 2014
Country Metric
USA GDP 7 4 0 2
Pop. 2 3 0 3
GB GDP 8 7 0 7
Pop. 2 6 0 0
FR GDP 5 0 0 1
Pop. 1 1 0 5
Now to get the GDPs, you can take a cross-section of the dataframe via the xs
method: 现在要获得GDP,您可以通过
xs
方法获取数据帧的横截面:
data.xs('GDP', level='Metric')
2011 2012 2013 2014
Country
USA 7 4 0 2
GB 8 7 0 7
FR 5 0 0 1
It's so easy because your data are already pivoted/unstacked. 它非常简单,因为您的数据已经被转动/取消堆叠。 IF they weren't and looked like this:
如果它们不是,看起来像这样:
data.columns.names = ['Year']
data = data.stack()
data
Country Metric Year
USA GDP 2011 7
2012 4
2013 0
2014 2
Pop. 2011 2
2012 3
2013 0
2014 3
GB GDP 2011 8
2012 7
2013 0
2014 7
Pop. 2011 2
2012 6
2013 0
2014 0
FR GDP 2011 5
2012 0
2013 0
2014 1
Pop. 2011 1
2012 1
2013 0
2014 5
You could then use groupby
to tell you something about the world as a whole: 然后,您可以使用
groupby
告诉您有关整个世界的信息:
data.groupby(level=['Metric', 'Year']).sum()
Metric Year
GDP 2011 20
2012 11
2013 0
2014 10
Pop. 2011 5
2012 10
2013 0
2014 8
Or get real fancy: 或者得到真正的幻想:
data.groupby(level=['Metric', 'Year']).sum().unstack(level='Metric')
Metric GDP Pop.
Year
2011 20 5
2012 11 10
2013 0 0
2014 10 8
Is this what you are looking for: 这是你想要的:
df = df.groupby(['Metric'])
df.get_group('GDP')
Country Metric 2011 2012 2013 2014
0 USA GDP 7 4 0 2
2 GB GDP 8 7 0 7
4 FR GDP 5 0 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.