subsetting hierarchical data in pandas

Question

I am trying to subset hierarchical data that has two row ids.

Say I have data in hdf

index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
                                   ['one', 'two', 'three']],
                           labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
                                   [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]])
hdf = DataFrame(np.random.randn(10, 3), index=index,
                columns=['A', 'B', 'C'])
hdf

And I wish to subset so that i see foo and qux , subset to return only sub-row two and columns A and C .

I can do this in two steps as follows:

sub1 = hdf.ix[['foo','qux'], ['A', 'C']]
sub1.xs('two', level=1)

Is there a single-step way to do this?

thanks

Answer 1

Doesn't look the nicest, but use tuples to get the rows you want and then squares brackets to select the columns.

In [36]: hdf.loc[[('foo', 'two'), ('qux', 'two')]][['A', 'C']]
Out[36]: 
                A         C
foo two -0.356165  0.565022
qux two -0.701186  0.026532

loc could be swapped out for ix here.

Answer 2

In [125]: hdf[hdf.index.get_level_values(0).isin(['foo', 'qux']) & (hdf.index.get_level_values(1) == 'two')][['A', 'C']]
Out[125]: 
                A         C
foo two -0.113320 -1.215848
qux two  0.953584  0.134363

Much more complicated, but it would be better if you have many different values you want to choose in level one.

Answer 3

itertools to the rescue:

>>> from itertools import product
>>> 
>>> def _p(*iterables):
...     return list(product(*iterables))
... 
>>> hdf.ix[ _p(('foo','qux'),('two',)), ['A','C'] ]
                A         C
foo two  1.125401  1.389568
qux two  1.051455 -0.271256
>>>

Answer 4

Thanks everyone for your help. I also hit upon this solution:

hdf.ix[['bar','qux'], ['A', 'C']].xs('two', level=1)

subsetting hierarchical data in pandas

Question

4 answers

solution1
2 2013-07-15 02:48:54

solution2
2 ACCPTED 2013-07-15 03:24:54

solution3
1 2013-07-15 03:41:35

solution4
1 2013-07-15 08:33:33

subsetting hierarchical data in pandas

Question

4 answers

solution1 2 2013-07-15 02:48:54

solution2 2 ACCPTED 2013-07-15 03:24:54

solution3 1 2013-07-15 03:41:35

solution4 1 2013-07-15 08:33:33

solution1
2 2013-07-15 02:48:54

solution2
2 ACCPTED 2013-07-15 03:24:54

solution3
1 2013-07-15 03:41:35

solution4
1 2013-07-15 08:33:33