![](/img/trans.png)
[英]Efficiently finding rows following a subset of rows of a MultiIndex pandas DataFrame
[英]Setting varying columns for a subset of rows in a pandas multiindex dataframe
我想將大熊貓數據框df的特定行和多索引列中的值重新分配為已計算並存儲在數據框df_sub的較小掩碼子集中的非NaN值。
df =
A B
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 -51.0 -50.0 -49.0 -48.0 -47.0 -46.0 -45.0 -44.0 -43.0 -42.0
1 11.0 12.0 13.0 14.0 15.0 16.0 17.0 18.0 19.0 20.0 -41.0 -40.0 -39.0 -38.0 -37.0 -36.0 -35.0 -34.0 -33.0 -32.0
2 21.0 22.0 23.0 24.0 25.0 26.0 27.0 28.0 29.0 30.0 -31.0 -30.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0
3 31.0 32.0 33.0 34.0 35.0 36.0 37.0 38.0 39.0 40.0 -21.0 -20.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0
4 41.0 42.0 43.0 44.0 45.0 46.0 47.0 48.0 49.0 50.0 -11.0 -10.0 -9.0 -8.0 -7.0 -6.0 -5.0 -4.0 -3.0 -2.0
df_sub =
0 1 2 3 4 5 6 7 8 9
1 NaN NaN NaN NaN NaN 0.3 0.2 0.1 NaN NaN
3 NaN NaN NaN 0.6 0.9 0.7 NaN NaN NaN NaN
我的目標是獲取df.loc [:,'B']的結果,如下所示,其中df_sub中的非NaN值替換了df (ie, df.loc[1, pd.IndexSlice['B', 5:7]] = df_sub.loc[1, 5:7] and df.loc[3, pd.IndexSlice['B', 3:5]] = df_sub.loc[3, 3:5])
的相應行和列df (ie, df.loc[1, pd.IndexSlice['B', 5:7]] = df_sub.loc[1, 5:7] and df.loc[3, pd.IndexSlice['B', 3:5]] = df_sub.loc[3, 3:5])
:
df.loc[:,'B'] =
0 1 2 3 4 5 6 7 8 9
0 -51.0 -50.0 -49.0 -48.0 -47.0 -46.0 -45.0 -44.0 -43.0 -42.0
1 -41.0 -40.0 -39.0 -38.0 -37.0 0.3 0.2 0.1 -33.0 -32.0
2 -31.0 -30.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0
3 -21.0 -20.0 -19.0 0.6 0.9 0.7 -15.0 -14.0 -13.0 -12.0
4 -11.0 -10.0 -9.0 -8.0 -7.0 -6.0 -5.0 -4.0 -3.0 -2.0
但是,我得到的是NaN:
df.loc[:,'B'] =
0 1 2 3 4 5 6 7 8 9
0 -51.0 -50.0 -49.0 -48.0 -47.0 -46.0 -45.0 -44.0 -43.0 -42.0
1 -41.0 -40.0 -39.0 -38.0 -37.0 NaN NaN NaN -33.0 -32.0
2 -31.0 -30.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0
3 -21.0 -20.0 -19.0 NaN NaN NaN -15.0 -14.0 -13.0 -12.0
4 -11.0 -10.0 -9.0 -8.0 -7.0 -6.0 -5.0 -4.0 -3.0 -2.0
我的簡單示例代碼包含在下面。 從診斷看來,一切似乎都按預期進行:1)為df_sub的每一行標識了df_sub的非nan值及其索引,2)原始df的切片似乎正確,並且3)進行分配時不會產生投訴或“設置副本”警告。
簡化示例:
# Create data for example case
idf = pd.MultiIndex.from_product([['A', 'B'], np.arange(0,10)])
df = pd.DataFrame(np.concatenate((np.arange(1.,51.).reshape(5,10),
np.arange(-51., -1.).reshape(5,10)), axis=1),
index=np.arange(0,5), columns=idf)
df_sub = pd.DataFrame([[np.nan, np.nan, np.nan, np.nan, np.nan, 0.5, 0.6, 0.7, np.nan, np.nan],
[np.nan, np.nan, np.nan, 0.3, 0.4, 0.5, np.nan, np.nan, np.nan, np.nan]],
index=[1,3], columns=np.arange(0,10))
dfsub_idx = df_sub.index
# Perform assignments
for (idx, row) in df_sub.iterrows() :
arr = row.index[~row.isnull()]
print 'row {}: \n{}'.format(idx, row)
print 'non-nan indices: {}\n'.format(arr)
print 'df before mod: \n{}'.format(df.loc[idx, pd.IndexSlice['B', arr.tolist()]])
df.loc[idx, pd.IndexSlice['B', arr.tolist()]] = row[arr]
print 'df after mod: \n{}'.format(df.loc[idx, pd.IndexSlice['B', arr.tolist()]])
您應該在df_sub
之后的.iloc
末尾添加values
df.loc[1, pd.IndexSlice['B', 5:7]] = df_sub.loc[1, 5:7].values
df.loc[3, pd.IndexSlice['B', 3:5]] = df_sub.loc[3, 3:5].values
pandas.DataFrame.align
和pandas.DataFrame.fillna
內聯 通過使用level
參數
pd.DataFrame.fillna(*df_sub.align(df, level=1))
A B
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 -51.0 -50.0 -49.0 -48.0 -47.0 -46.0 -45.0 -44.0 -43.0 -42.0
1 11.0 12.0 13.0 14.0 15.0 0.5 0.6 0.7 19.0 20.0 -41.0 -40.0 -39.0 -38.0 -37.0 0.5 0.6 0.7 -33.0 -32.0
2 21.0 22.0 23.0 24.0 25.0 26.0 27.0 28.0 29.0 30.0 -31.0 -30.0 -29.0 -28.0 -27.0 -26.0 -25.0 -24.0 -23.0 -22.0
3 31.0 32.0 33.0 0.3 0.4 0.5 37.0 38.0 39.0 40.0 -21.0 -20.0 -19.0 0.3 0.4 0.5 -15.0 -14.0 -13.0 -12.0
4 41.0 42.0 43.0 44.0 45.0 46.0 47.0 48.0 49.0 50.0 -11.0 -10.0 -9.0 -8.0 -7.0 -6.0 -5.0 -4.0 -3.0 -2.0
update
到位 df.update(df_sub.align(df, level=1)[0])
這個:
pd.DataFrame.fillna(*df_sub.align(df, level=1))
相當於
a, b = df_sub.align(df, level=1)
a.fillna(b)
# Or pd.DataFrame.fillna(a, b)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.