Setting Multiple Layers of a Multiindex Series

Question

TLDR: How do you set values in a multilevel list, by any slice. I got it to work on the outermost slice, but not if you along a "middle"

Suppose you have a 2 or 3 layer multi index Series that looks as follows:

_s01_|_s02_|_s03_|____
 'a' | 'c' | 'n' | 0.0
           | 'm' | 0.1
           | 'o' | 0.2
     | 'd' | 'n' | 0.3
           | 'o' | 0.4
 'b' | 'c' | 'n' | 0.5
        .........

Here is what I'm currently trying to do:

r = pd.Series(0,index - data.index) #so create a similar structure
for i in data.index.levels[1]:
    d = data.loc[(slice(None),i,slice(None)]
    #manipulate values in d
    r.loc[(slice(None),i,slice(None)] = d

This just sets all of the r values that are sliced into, to NaN .

Is there a universal way to VIEW into a multilevel indexed Series and set values? I was trying something very similar with a DataFrame and the issue that was causing the same problem was that .loc was dropping levels and then the indices weren't the same. I fixed the issue there by modifying the syntax to the one that now am attempting to use with series.

Any help would be greatly apprecaited

Answer 1

Pandas recommends using pd.IndexSlice or similar syntax rather than slice(). (See more documentation on slicers here. ), eg

explicitly:

idx = pd.IndexSlice
series.loc[idx[:, 'c', :]]

You could omit the idx step shortcut if you're just trying to get the entire entry of your selected rows: series.loc[:, 'c', :] (It's essentially what happens with simple indexing.)

However, it's better to use pd.IndexSlice, and necessary for more if you're trying to index in in a Dataframe.

Say we have your Series

series

>  s01  s02  s03
a    c    n      1
          m      0
          o      4
     d    n      6
          o      9
b    c    n      4
dtype: float64

Indexing on Multilevel indexes in pd.Series and pd.Dataframe

Key part

To do indexing, we need to first lexsort the series index:

series.sort_index(inplace = True)

Then, to do any indexing, we need a pd.IndexSlice object which defines the selection for .loc by:

idx = pd.IndexSlice
# do your indexing
series.loc[idx[:,'c',:]]

Details

Indexing on a Multilevel index doesn't work without pd.IndexSlice:

On a Series:

series.loc[[:,'c',:]]` will give you:

File "<ipython-input-101-21968807c1d1>", line 1
    df.loc[[:,'c',:]]
        ^
SyntaxError: invalid syntax


# with IndexSlice
idx = pd.IndexSlice
series.loc[idx[:,'c',:]]

>  s01  s03
a    n      1
     m      0
     o      4
b    n      4
dtype: int64

If we have a pd.DataFrame, we do a similar thing.

Say we have the following pd.Dataframe:

df
>              hello animal   i_like
s01 s02 s03                       
a   c   m        0  Goose  dislike
        n        1  Panda     like
        o        4  Tiger     like
    d   n        6  Goose     like
        o        9   Bear  dislike
b   c   n        4   Dog  dislike

To index:

df.sort_index(inplace = True) # need to lexsort for indexing

# without pd.IndexSlice
df.loc[:,'c',:]   # the whole entry 
File "<ipython-input-118-9544c9b9f9da>", line 1
df.loc[(:,'c',:)]
        ^
SyntaxError: invalid syntax

# with pd.IndexSlice
idx = pd.IndexSlice
df.loc[idx[:,'c',:],:]

>             hello animal   i_like
s01 s02 s03                       
a   c   m        0  Goose  dislike
        n        1  Panda     like
        o        4  Tiger     like
b   c   n        4   Dog  dislike

and for specific columns

df.loc[idx[:,'d',:],['hello','animal']]

>              hello animal
s01 s02 s03              
a   d   n        6  Goose
        o        9   Bear

Setting values

If you'd like to set value(s) on your selection, you can do it as per usual:

For a Series:

my_select = series.loc[idx[:,'c',:],:]
series.loc[idx[:,'c',:]] = my_select.apply(lambda x: x*3)

series
> s01  s02  s03
a    c    m       0
          n       3
          o      12
     d    n       6
          o       9
b    c    n      12
dtype: int64

For a Dataframe:

my_select = df.loc[idx[:,'d',:],:]
df.loc[idx[:,'d',:],['i_like']] = my_select.apply(
      lambda x: "dislike" if x.hello<5 else "like", axis=1)

df
>             hello animal   i_like
s01 s02 s03                       
a   c   m        0  Goose  dislike
        n        1  Panda  dislike
        o        4  Tiger     like
    d   n        6  Goose     like
        o        9   Bear  dislike
b   c   n        4   Dog     like

# Panda is changed to "dislike", and Dog to "like".

PS. Note commas/colons (or lack thereof)!

Hope this helps!

Setting Multiple Layers of a Multiindex Series

Question

1 answers

solution1
2 ACCPTED 2017-05-26 11:51:02

Indexing on Multilevel indexes in pd.Series and pd.Dataframe

Key part

Details

Setting values

Setting Multiple Layers of a Multiindex Series

Question

1 answers

solution1 2 ACCPTED 2017-05-26 11:51:02

Indexing on Multilevel indexes in pd.Series and pd.Dataframe

Key part

Details

Setting values

solution1
2 ACCPTED 2017-05-26 11:51:02