[英]MultiIndex DataFrame - Getting only the possible values of a lower level index given an upper level index value
When I slice into a MultiIndex
DataFrame
by a level 0 index value, I want to know the possible level 1+ index values that fall under that initial value. 当我将0级索引值切成MultiIndex
DataFrame
,我想知道落在该初始值以下的1级以上索引值。 If my wording doesn't make sense, here's an example: 如果我的措辞没有道理,请举一个例子:
>>> arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
... ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'],
... ['a','b','a','b','b','b','b','b']]
>>> tuples = list(zip(*arrays))
>>> index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second','third'])
>>> s = pd.Series(np.random.randn(8), index=index)
>>> s
first second third
bar one a -0.598684
two b 0.351421
baz one a -0.618285
two b -1.175418
foo one b -0.093806
two b 1.092197
qux one b -1.515515
two b 0.741408
dtype: float64
s
's index
looks like: s
index
如下:
>>> s.index
MultiIndex(levels=[[u'bar', u'baz', u'foo', u'qux'], [u'one', u'two'], [u'a', u'b']],
labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1], [0, 1, 0, 1, 1, 1, 1, 1]],
names=[u'first', u'second', u'third'])
When I take just the section of s
whose first
index value is foo
, and look up the index of that I get: 当我只取s
的first
索引值为foo
,查找得到的索引:
>>> s_foo = s.loc['foo']
>>> s_foo
second third
one b -0.093806
two b 1.092197
dtype: float64
>>> s_foo.index
MultiIndex(levels=[[u'one', u'two'], [u'a', u'b']],
labels=[[0, 1], [1, 1]],
names=[u'second', u'third'])
I want the index
of s_foo
to act as if the higher level of s
does not exist, yet we can see in s_foo.index
's levels
attribute that a
is still considered a potential value of index third
, despite the fact that s_foo
only has b
as a possible value. 我希望s_foo
的index
就像不存在更高级别的s
一样起作用,但是我们可以在s_foo.index
的levels
属性中看到a
仍然被认为是索引third
的潜在值,尽管s_foo
仅具有b
作为可能的值。
Essentially, what I want to find are all the possible third
values of foo_s
, ie b
and only b
. 本质上,我想查找的是foo_s
所有可能的third
值,即b
和仅b
。 Right now I do set(s_foo.reset_index()['third'])
, but I was hoping for a more elegant solution 现在我做了set(s_foo.reset_index()['third'])
,但我希望有一个更优雅的解决方案
You can create s_foo and explicitly drop the unused levels: 您可以创建s_foo并显式删除未使用的级别:
s_foo = s.loc['foo']
s_foo.index = s_foo.index.remove_unused_levels()
Reset index seems like the right way to go, seems like you don't want it to be an index (the result you're getting is the way indexes work). 重置索引似乎是正确的方法,似乎您不希望它成为索引(得到的结果就是索引的工作方式)。
s.reset_index(level=2).groupby(level=[0])['third'].unique()
or if you want counts 或者如果你想计数
s.reset_index(level=2).groupby(level=[0])['third'].value_counts()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.