简体   繁体   中英

How to slice multiindex dataframe with list of labels on one level

MultiIndex dataframes are very powerful but personally I think there is no enough (clear) documentations on it, specially for different type of slicing... Here is my question:

How to slice a multi-indexed dataframe just on one level with a list of labels? Please help me if you have a solution ( without reseting indexes and converting the dataframe to single level index! Which is obvious and not efficient )

For example, we have following dataframe:

import pandas as pd
import numpy as np

df = pd.DataFrame(index=range(10))
df['id'] = pd.Series(range(10,20))
df['name'] = [f'name_{id}' for id in range(10,20)]
df['price'] = np.random.rand(df.index.size)
df['date'] = pd.date_range('20200310', '20200319')
df = df.set_index(['id', 'date'])
df

在此处输入图像描述

Slicing on one label is working just fine:

df.xs('2020-03-10', level='date', drop_level=False)

在此处输入图像描述

But how can we slice on a list of labels on that level?

df.xs(('2020-03-10', '2020-03-11', '2020-03-12'), level='date', drop_level=False)

This leads to an exception:

在此处输入图像描述

However Python doc says that "key" parameter could be a tuple as well:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.xs.html

在此处输入图像描述

For filter by multiple values use Index.get_level_values with Index.isin and boolean indexing :

a = df[df.index.get_level_values('date').isin(('2020-03-10', '2020-03-11', '2020-03-12'))]
print (a)
                  name     price
id date                         
10 2020-03-10  name_10  0.557772
11 2020-03-11  name_11  0.122315
12 2020-03-12  name_12  0.775976

However Python doc says that "key" parameter could be a tuple as well:

Tuple is possible use, but working differently - you can select by both labels like:

b = df.xs((10, '2020-03-10'), drop_level=False)
print (b)
name      name_10
price    0.348808
Name: (10, 2020-03-10 00:00:00), dtype: object

c = df.xs((10, '2020-03-10'), level=('id','date'), drop_level=False)
print (c)
                  name     price
id date                         
10 2020-03-10  name_10  0.239876

Like @yatu mentioned, another solution with IndexSlice is with : for all first levels and last : for all columns:

df = df.loc[pd.IndexSlice[:, ['2020-03-10', '2020-03-11', '2020-03-12']], :]
print (df)
                  name     price
id date                         
10 2020-03-10  name_10  0.557488
11 2020-03-11  name_11  0.592082
12 2020-03-12  name_12  0.547747

The use of tuples when accessing multiindex is meant to address the different levels/hierarchy. Tuples are meant for this use, not as a form of passing multiple items within the same hierarchy/level. For multiple selections within the same level you need to use some other functions such as the one Jezrael .

dates = ['2020-03-10', '2020-03-11', '2020-03-12']
filtered_df = df[df.index.get_level_values('date').isin(dates)]

This is a slight variation from the answer provided by @jezrael.

You can use loc() combined with slice(None) like this:

dates = ['2020-03-10', '2020-03-11', '2020-03-12']

df.loc[(slice(None), dates), :]


id  date        name    price
10  2020-03-10  name_10 0.36806
11  2020-03-11  name_11 0.20436
12  2020-03-12  name_12 0.00443

The first argument in .loc is a tuple that selects rows in the MultiIndex. slice(None) gets all the values from the first level id . The list dates filters keys in the second level date . The second argument : selects all columns.

In the Pandas Documentation - MultiIndex - Advanced Indexing you can find:

It is important to note that tuples and lists are not treated identically in pandas when it comes to indexing. Whereas a tuple is interpreted as one multi-level key, a list is used to specify several keys. Or in other words, tuples go horizontally (traversing levels), lists go vertically (scanning levels).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM