I am trying to subset hierarchical data that has two row ids.
Say I have data in hdf
index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
['one', 'two', 'three']],
labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
[0, 1, 2, 0, 1, 1, 2, 0, 1, 2]])
hdf = DataFrame(np.random.randn(10, 3), index=index,
columns=['A', 'B', 'C'])
hdf
And I wish to subset so that i see foo
and qux
, subset to return only sub-row two
and columns A
and C
.
I can do this in two steps as follows:
sub1 = hdf.ix[['foo','qux'], ['A', 'C']]
sub1.xs('two', level=1)
Is there a single-step way to do this?
thanks
Doesn't look the nicest, but use tuples to get the rows you want and then squares brackets to select the columns.
In [36]: hdf.loc[[('foo', 'two'), ('qux', 'two')]][['A', 'C']]
Out[36]:
A C
foo two -0.356165 0.565022
qux two -0.701186 0.026532
loc
could be swapped out for ix
here.
In [125]: hdf[hdf.index.get_level_values(0).isin(['foo', 'qux']) & (hdf.index.get_level_values(1) == 'two')][['A', 'C']]
Out[125]:
A C
foo two -0.113320 -1.215848
qux two 0.953584 0.134363
Much more complicated, but it would be better if you have many different values you want to choose in level one.
itertools
to the rescue:
>>> from itertools import product
>>>
>>> def _p(*iterables):
... return list(product(*iterables))
...
>>> hdf.ix[ _p(('foo','qux'),('two',)), ['A','C'] ]
A C
foo two 1.125401 1.389568
qux two 1.051455 -0.271256
>>>
Thanks everyone for your help. I also hit upon this solution:
hdf.ix[['bar','qux'], ['A', 'C']].xs('two', level=1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.