為熊貓多索引數據框中的行的子集設置不同的列

Question

我想將大熊貓數據框df的特定行和多索引列中的值重新分配為已計算並存儲在數據框df_sub的較小掩碼子集中的非NaN值。

df =
    A                                                           B        
      0     1     2     3     4     5     6     7     8     9      0     1     2     3     4     5     6     7     8     9        
0   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0  -51.0 -50.0 -49.0 -48.0 -47.0 -46.0 -45.0 -44.0 -43.0 -42.0   
1  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0  -41.0 -40.0 -39.0 -38.0 -37.0 -36.0 -35.0 -34.0 -33.0 -32.0   
2  21.0  22.0  23.0  24.0  25.0  26.0  27.0  28.0  29.0  30.0  -31.0 -30.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0   
3  31.0  32.0  33.0  34.0  35.0  36.0  37.0  38.0  39.0  40.0  -21.0 -20.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0   
4  41.0  42.0  43.0  44.0  45.0  46.0  47.0  48.0  49.0  50.0  -11.0 -10.0  -9.0  -8.0  -7.0  -6.0  -5.0  -4.0  -3.0  -2.0  

df_sub =
      0     1     2     3     4     5     6     7     8     9 
1    NaN   NaN   NaN   NaN   NaN   0.3   0.2   0.1   NaN   NaN
3    NaN   NaN   NaN   0.6   0.9   0.7   NaN   NaN   NaN   NaN

我的目標是獲取df.loc [：，'B']的結果，如下所示，其中df_sub中的非NaN值替換了df (ie, df.loc[1, pd.IndexSlice['B', 5:7]] = df_sub.loc[1, 5:7] and df.loc[3, pd.IndexSlice['B', 3:5]] = df_sub.loc[3, 3:5])的相應行和列df (ie, df.loc[1, pd.IndexSlice['B', 5:7]] = df_sub.loc[1, 5:7] and df.loc[3, pd.IndexSlice['B', 3:5]] = df_sub.loc[3, 3:5]) ：

df.loc[:,'B'] =
      0     1     2     3     4     5     6     7     8     9
0 -51.0 -50.0 -49.0 -48.0 -47.0 -46.0 -45.0 -44.0 -43.0 -42.0
1 -41.0 -40.0 -39.0 -38.0 -37.0   0.3   0.2   0.1 -33.0 -32.0
2 -31.0 -30.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0
3 -21.0 -20.0 -19.0   0.6   0.9   0.7 -15.0 -14.0 -13.0 -12.0
4 -11.0 -10.0  -9.0  -8.0  -7.0  -6.0  -5.0  -4.0  -3.0  -2.0

但是，我得到的是NaN：

df.loc[:,'B'] =
      0     1     2     3     4     5     6     7     8     9
0 -51.0 -50.0 -49.0 -48.0 -47.0 -46.0 -45.0 -44.0 -43.0 -42.0
1 -41.0 -40.0 -39.0 -38.0 -37.0   NaN   NaN   NaN -33.0 -32.0
2 -31.0 -30.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0
3 -21.0 -20.0 -19.0   NaN   NaN   NaN -15.0 -14.0 -13.0 -12.0
4 -11.0 -10.0  -9.0  -8.0  -7.0  -6.0  -5.0  -4.0  -3.0  -2.0

我的簡單示例代碼包含在下面。 從診斷看來，一切似乎都按預期進行：1）為df_sub的每一行標識了df_sub的非nan值及其索引，2）原始df的切片似乎正確，並且3）進行分配時不會產生投訴或“設置副本”警告。

什么是實現我的目標的合適方法？
為什么會失敗？
有沒有更緊湊，更有效的方式來執行作業？

簡化示例：

# Create data for example case
idf = pd.MultiIndex.from_product([['A', 'B'], np.arange(0,10)])
df = pd.DataFrame(np.concatenate((np.arange(1.,51.).reshape(5,10), 
                  np.arange(-51., -1.).reshape(5,10)), axis=1), 
                  index=np.arange(0,5), columns=idf)
df_sub = pd.DataFrame([[np.nan, np.nan, np.nan, np.nan, np.nan, 0.5, 0.6, 0.7, np.nan, np.nan], 
                      [np.nan, np.nan, np.nan, 0.3, 0.4, 0.5, np.nan, np.nan, np.nan, np.nan]],
                      index=[1,3], columns=np.arange(0,10))
dfsub_idx = df_sub.index

# Perform assignments
for (idx, row) in df_sub.iterrows() :
   arr = row.index[~row.isnull()] 
   print 'row {}: \n{}'.format(idx, row)
   print 'non-nan indices: {}\n'.format(arr)
   print 'df before mod: \n{}'.format(df.loc[idx, pd.IndexSlice['B', arr.tolist()]])
   df.loc[idx, pd.IndexSlice['B', arr.tolist()]] = row[arr] 
   print 'df after mod: \n{}'.format(df.loc[idx, pd.IndexSlice['B', arr.tolist()]])

Answer 1

您應該在df_sub之后的.iloc末尾添加values

df.loc[1, pd.IndexSlice['B', 5:7]] = df_sub.loc[1, 5:7].values 
df.loc[3, pd.IndexSlice['B', 3:5]] = df_sub.loc[3, 3:5].values

Answer 2

與`pandas.DataFrame.align`和`pandas.DataFrame.fillna`內聯

通過使用level參數

pd.DataFrame.fillna(*df_sub.align(df, level=1))

      A                                                           B                                                      
      0     1     2     3     4     5     6     7     8     9     0     1     2     3     4     5     6     7     8     9
0   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0 -51.0 -50.0 -49.0 -48.0 -47.0 -46.0 -45.0 -44.0 -43.0 -42.0
1  11.0  12.0  13.0  14.0  15.0   0.5   0.6   0.7  19.0  20.0 -41.0 -40.0 -39.0 -38.0 -37.0   0.5   0.6   0.7 -33.0 -32.0
2  21.0  22.0  23.0  24.0  25.0  26.0  27.0  28.0  29.0  30.0 -31.0 -30.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0
3  31.0  32.0  33.0   0.3   0.4   0.5  37.0  38.0  39.0  40.0 -21.0 -20.0 -19.0   0.3   0.4   0.5 -15.0 -14.0 -13.0 -12.0
4  41.0  42.0  43.0  44.0  45.0  46.0  47.0  48.0  49.0  50.0 -11.0 -10.0  -9.0  -8.0  -7.0  -6.0  -5.0  -4.0  -3.0  -2.0

`update`到位

df.update(df_sub.align(df, level=1)[0])

澄清度

這個：

pd.DataFrame.fillna(*df_sub.align(df, level=1))

相當於

a, b = df_sub.align(df, level=1)
a.fillna(b)
# Or pd.DataFrame.fillna(a, b)

為熊貓多索引數據框中的行的子集設置不同的列

問題描述

2 個解決方案

解決方案1
2 2018-05-14 19:22:07

解決方案2
2 2018-05-14 19:51:32

與`pandas.DataFrame.align`和`pandas.DataFrame.fillna`內聯

`update`到位

澄清度

為熊貓多索引數據框中的行的子集設置不同的列

問題描述

2 個解決方案

解決方案1 2 2018-05-14 19:22:07

解決方案2 2 2018-05-14 19:51:32

與pandas.DataFrame.align和pandas.DataFrame.fillna內聯

update到位

澄清度

解決方案1
2 2018-05-14 19:22:07

解決方案2
2 2018-05-14 19:51:32

與`pandas.DataFrame.align`和`pandas.DataFrame.fillna`內聯

`update`到位