简体   繁体   中英

Setting varying columns for a subset of rows in a pandas multiindex dataframe

I want to re-assign values in specific rows and varying multi-index columns of a large pandas dataframe, df, to non NaN values that have been calculated and stored in a slightly smaller masked subset of the dataframe, df_sub.

df =
    A                                                           B        
      0     1     2     3     4     5     6     7     8     9      0     1     2     3     4     5     6     7     8     9        
0   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0  -51.0 -50.0 -49.0 -48.0 -47.0 -46.0 -45.0 -44.0 -43.0 -42.0   
1  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0  -41.0 -40.0 -39.0 -38.0 -37.0 -36.0 -35.0 -34.0 -33.0 -32.0   
2  21.0  22.0  23.0  24.0  25.0  26.0  27.0  28.0  29.0  30.0  -31.0 -30.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0   
3  31.0  32.0  33.0  34.0  35.0  36.0  37.0  38.0  39.0  40.0  -21.0 -20.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0   
4  41.0  42.0  43.0  44.0  45.0  46.0  47.0  48.0  49.0  50.0  -11.0 -10.0  -9.0  -8.0  -7.0  -6.0  -5.0  -4.0  -3.0  -2.0  

df_sub =
      0     1     2     3     4     5     6     7     8     9 
1    NaN   NaN   NaN   NaN   NaN   0.3   0.2   0.1   NaN   NaN
3    NaN   NaN   NaN   0.6   0.9   0.7   NaN   NaN   NaN   NaN

My goal is to get the result, shown below, for df.loc[:,'B'] where the non NaN values in df_sub replace the respective row and columns of df (ie, df.loc[1, pd.IndexSlice['B', 5:7]] = df_sub.loc[1, 5:7] and df.loc[3, pd.IndexSlice['B', 3:5]] = df_sub.loc[3, 3:5]) :

df.loc[:,'B'] =
      0     1     2     3     4     5     6     7     8     9
0 -51.0 -50.0 -49.0 -48.0 -47.0 -46.0 -45.0 -44.0 -43.0 -42.0
1 -41.0 -40.0 -39.0 -38.0 -37.0   0.3   0.2   0.1 -33.0 -32.0
2 -31.0 -30.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0
3 -21.0 -20.0 -19.0   0.6   0.9   0.7 -15.0 -14.0 -13.0 -12.0
4 -11.0 -10.0  -9.0  -8.0  -7.0  -6.0  -5.0  -4.0  -3.0  -2.0

However, rather getting the desired values, I am getting NaNs:

df.loc[:,'B'] =
      0     1     2     3     4     5     6     7     8     9
0 -51.0 -50.0 -49.0 -48.0 -47.0 -46.0 -45.0 -44.0 -43.0 -42.0
1 -41.0 -40.0 -39.0 -38.0 -37.0   NaN   NaN   NaN -33.0 -32.0
2 -31.0 -30.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0
3 -21.0 -20.0 -19.0   NaN   NaN   NaN -15.0 -14.0 -13.0 -12.0
4 -11.0 -10.0  -9.0  -8.0  -7.0  -6.0  -5.0  -4.0  -3.0  -2.0

My simple sample code is included below. From the diagnostics, it looks like everything is behaving as expected: 1) the non-nan values and their indices from df_sub are identified for each row of df_sub, 2) the slicing of the original df appears to be correct, and 3) the assignment is made without a complaint or a "setting copy" warning.

  1. What is the appropriate way to accomplish my goal?
  2. Why is this failing?
  3. Is there a more compact, efficient way to perform the assignments?

Simplified example:

# Create data for example case
idf = pd.MultiIndex.from_product([['A', 'B'], np.arange(0,10)])
df = pd.DataFrame(np.concatenate((np.arange(1.,51.).reshape(5,10), 
                  np.arange(-51., -1.).reshape(5,10)), axis=1), 
                  index=np.arange(0,5), columns=idf)
df_sub = pd.DataFrame([[np.nan, np.nan, np.nan, np.nan, np.nan, 0.5, 0.6, 0.7, np.nan, np.nan], 
                      [np.nan, np.nan, np.nan, 0.3, 0.4, 0.5, np.nan, np.nan, np.nan, np.nan]],
                      index=[1,3], columns=np.arange(0,10))
dfsub_idx = df_sub.index

# Perform assignments
for (idx, row) in df_sub.iterrows() :
   arr = row.index[~row.isnull()] 
   print 'row {}: \n{}'.format(idx, row)
   print 'non-nan indices: {}\n'.format(arr)
   print 'df before mod: \n{}'.format(df.loc[idx, pd.IndexSlice['B', arr.tolist()]])
   df.loc[idx, pd.IndexSlice['B', arr.tolist()]] = row[arr] 
   print 'df after mod: \n{}'.format(df.loc[idx, pd.IndexSlice['B', arr.tolist()]])

You should add values at the end of df_sub after .iloc

df.loc[1, pd.IndexSlice['B', 5:7]] = df_sub.loc[1, 5:7].values 
df.loc[3, pd.IndexSlice['B', 3:5]] = df_sub.loc[3, 3:5].values

Inline with pandas.DataFrame.align and pandas.DataFrame.fillna

By using the level argument

pd.DataFrame.fillna(*df_sub.align(df, level=1))

      A                                                           B                                                      
      0     1     2     3     4     5     6     7     8     9     0     1     2     3     4     5     6     7     8     9
0   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0 -51.0 -50.0 -49.0 -48.0 -47.0 -46.0 -45.0 -44.0 -43.0 -42.0
1  11.0  12.0  13.0  14.0  15.0   0.5   0.6   0.7  19.0  20.0 -41.0 -40.0 -39.0 -38.0 -37.0   0.5   0.6   0.7 -33.0 -32.0
2  21.0  22.0  23.0  24.0  25.0  26.0  27.0  28.0  29.0  30.0 -31.0 -30.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0
3  31.0  32.0  33.0   0.3   0.4   0.5  37.0  38.0  39.0  40.0 -21.0 -20.0 -19.0   0.3   0.4   0.5 -15.0 -14.0 -13.0 -12.0
4  41.0  42.0  43.0  44.0  45.0  46.0  47.0  48.0  49.0  50.0 -11.0 -10.0  -9.0  -8.0  -7.0  -6.0  -5.0  -4.0  -3.0  -2.0

In place with update

df.update(df_sub.align(df, level=1)[0])

Clarification

This:

pd.DataFrame.fillna(*df_sub.align(df, level=1))

Is equivalent to

a, b = df_sub.align(df, level=1)
a.fillna(b)
# Or pd.DataFrame.fillna(a, b)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM