简体   繁体   中英

Multi-indexed dataframe: Setting values

I already asked a related question earlier, but I didn't want to start a comment-and-edit-discussion. So here's -boiled down - what the answer to my earlier question lead me to ask. Consider

import pandas as pd
from numpy import arange
from scipy import random

index = pd.MultiIndex.from_product([arange(0,3), arange(10,15)], names=['A', 'B'])
df = pd.DataFrame(columns=['test'], index=index)
someValues = random.randint(0, 10, size=5)

df.loc[0, 'test'] , df.loc[0,:] and df.ix[0] all create a representation of a part of the data frame, the first one being a Series and the other two being df slices. However

  • df.ix[0] = df.loc[0,'test'] = someValues sets the value for the df
  • df.loc[0,'test'] = someValues gives an error ValueError: total size of new array must be unchanged
  • df.loc[0,:] = someValues is being ignored. No error, but the df does not contain the numpy array.

I skimmed the docs but there was no clear logical and systematical explanation on what is going on with MultiIndexes in general. So far, I guess that "if the view is a Series, you can set values", and "otherwise, god knows what happens".

Could someone shed some light on the logic? Moreover, is there some deep meaning behind this or are these just constraints due to how it is set up?

These are all with 0.13.1

These are not all 'slice' representations at all.

This is a Series.

In [50]: df.loc[0,'test']
Out[50]: 
B
10    NaN
11    NaN
12    NaN
13    NaN
14    NaN
Name: test, dtype: object

These are DataFrames (and the same)

In [51]: df.loc[0,:]
Out[51]: 
   test
B      
10  NaN
11  NaN
12  NaN
13  NaN
14  NaN

[5 rows x 1 columns]

In [52]: df.ix[0]
Out[52]: 
   test
B      
10  NaN
11  NaN
12  NaN
13  NaN
14  NaN

[5 rows x 1 columns]

This is trying to assign the wrong shape (it looks like it should work, but if you have multiple columns then it won't, that is why this is not allowed)

In [54]: df.ix[0] = someValues
ValueError: could not broadcast input array from shape (5) into shape (5,1)

This works because it knows how to broadcast

In [56]: df.loc[0,:] = someValues

In [57]: df
Out[57]: 
     test
A B      
0 10    4
  11    3
  12    4
  13    2
  14    8
1 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN
2 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN

[15 rows x 1 columns]

This works fine

In [63]: df.loc[0,'test'] = someValues+1

In [64]: df
Out[64]: 
     test
A B      
0 10    5
  11    4
  12    5
  13    3
  14    9
1 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN
2 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN

[15 rows x 1 columns]

As does this

In [66]: df.loc[0,:] = someValues+1

In [67]: df
Out[67]: 
     test
A B      
0 10    5
  11    4
  12    5
  13    3
  14    9
1 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN
2 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN

[15 rows x 1 columns]

Not clear where you generated the cases in your question. I think the logic is pretty straightforward and consistent (their were several inconsistencies in prior versions however).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM