Multi-indexed dataframe: Setting values

Question

I already asked a related question earlier, but I didn't want to start a comment-and-edit-discussion. So here's -boiled down - what the answer to my earlier question lead me to ask. Consider

import pandas as pd
from numpy import arange
from scipy import random

index = pd.MultiIndex.from_product([arange(0,3), arange(10,15)], names=['A', 'B'])
df = pd.DataFrame(columns=['test'], index=index)
someValues = random.randint(0, 10, size=5)

df.loc[0, 'test'] , df.loc[0,:] and df.ix[0] all create a representation of a part of the data frame, the first one being a Series and the other two being df slices. However

df.ix[0] = df.loc[0,'test'] = someValues sets the value for the df
df.loc[0,'test'] = someValues gives an error ValueError: total size of new array must be unchanged
df.loc[0,:] = someValues is being ignored. No error, but the df does not contain the numpy array.

I skimmed the docs but there was no clear logical and systematical explanation on what is going on with MultiIndexes in general. So far, I guess that "if the view is a Series, you can set values", and "otherwise, god knows what happens".

Could someone shed some light on the logic? Moreover, is there some deep meaning behind this or are these just constraints due to how it is set up?

Answer 1

These are all with 0.13.1

These are not all 'slice' representations at all.

This is a Series.

In [50]: df.loc[0,'test']
Out[50]: 
B
10    NaN
11    NaN
12    NaN
13    NaN
14    NaN
Name: test, dtype: object

These are DataFrames (and the same)

In [51]: df.loc[0,:]
Out[51]: 
   test
B      
10  NaN
11  NaN
12  NaN
13  NaN
14  NaN

[5 rows x 1 columns]

In [52]: df.ix[0]
Out[52]: 
   test
B      
10  NaN
11  NaN
12  NaN
13  NaN
14  NaN

[5 rows x 1 columns]

This is trying to assign the wrong shape (it looks like it should work, but if you have multiple columns then it won't, that is why this is not allowed)

In [54]: df.ix[0] = someValues
ValueError: could not broadcast input array from shape (5) into shape (5,1)

This works because it knows how to broadcast

In [56]: df.loc[0,:] = someValues

In [57]: df
Out[57]: 
     test
A B      
0 10    4
  11    3
  12    4
  13    2
  14    8
1 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN
2 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN

[15 rows x 1 columns]

This works fine

In [63]: df.loc[0,'test'] = someValues+1

In [64]: df
Out[64]: 
     test
A B      
0 10    5
  11    4
  12    5
  13    3
  14    9
1 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN
2 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN

[15 rows x 1 columns]

As does this

In [66]: df.loc[0,:] = someValues+1

In [67]: df
Out[67]: 
     test
A B      
0 10    5
  11    4
  12    5
  13    3
  14    9
1 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN
2 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN

[15 rows x 1 columns]

Not clear where you generated the cases in your question. I think the logic is pretty straightforward and consistent (their were several inconsistencies in prior versions however).

Multi-indexed dataframe: Setting values

Question

1 answers

solution1
2 ACCPTED 2014-05-20 16:01:41

Multi-indexed dataframe: Setting values

Question

1 answers

solution1 2 ACCPTED 2014-05-20 16:01:41

solution1
2 ACCPTED 2014-05-20 16:01:41