简体   繁体   中英

Slicing MultiIndexed DataFrame in pandas

I have the following DataFrame named pop:

California  2000    33871648
            2010    37253956
New York    2000    18976457
            2010    19378102
Texas       2000    20851820
            2010    25145561

I want to print out values of 2010 in California and Texas. Whenever I try pop[['California','Texas'], 2010] I meet the error '(['California', 'Texas'], 2010)' is an invalid key How can I print the information then?

TLDR

df.loc[(level_1_indices_list, level_2_indices_list), :]

which is, in this case:

df.loc[(['California','Texas'], ['2010']), :]

Below is a more elaborated version.


# import packages & set seed
import numpy as np
import pandas as pd
np.random.seed(42)

Create example dataframe

Using the Pandas docs :

arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
          np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]

s = pd.Series(np.random.randn(8), index=arrays)

df = pd.DataFrame(np.random.randn(8, 4), index=arrays)

This will look like:

在此处输入图像描述


Slicing using multiindex

With df you can do:

df.loc[(['qux','foo'], 'one'), :]

For notational consistency you can use [] on the second element of the slice:

df.loc[(['qux','foo'], ['one']), :]

which will yield the same result.

Which is:

在此处输入图像描述

(Selecting 'one' is equivalent to selecting 2010 in your df . ['qux','foo'] should be equivalent to selecting ['California','Texas']. Based on this, I think you can apply the steps here to your data.)


This might also be helpful.

I would something like this with specifying column names which can be extended further by adding more &

df.loc[(df['level_0'].isin(['California', 'Texas'])) & (df['level_1'].isin(['2010']))]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM