如何使用多索引將新列分配給 pandas DataFrame 的切片？

Question

我有一個 pandas DataFrame 具有這樣的多索引：

import pandas as pd
import numpy as np

arr = [1]*3 + [2]*3
arr2 = list(range(3)) + list(range(3))
mux = pd.MultiIndex.from_arrays([
    arr,
    arr2
], names=['one', 'two'])

df = pd.DataFrame({'a': np.arange(len(mux))}, mux)
df

我有一個 function 需要一個 DataFrame 的切片，並且需要為已切片的行分配一個新列：

def work(df):
    b = df.copy()

    #do some work on the slice and create values for a new column of the slice
    b['b'] = b['a']*2

    #assign the new values back to the slice in a new column
    df['b'] = b['b']

#pass in a slice of the df with only records that have the last value for 'two'
work(df.loc[df.index.isin(df.index.get_level_values('two')[-1:], level=1)])

但是調用 function 會導致錯誤：

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # This is added back by InteractiveShellApp.init_path()

如何在原始 DataFrame 中創建一個新列“b”，並僅將其值分配給傳遞給 function 的行，留下 rest 的行？

所需的 output 是：

        a   b
one two 
1   0   0   nan
1   1   1   nan
1   2   2   4
2   0   3   nan
2   1   4   nan
2   2   5   10

注意：在工作 function 中，我實際上正在執行一系列復雜的操作，包括調用其他函數來生成新列的值，所以我認為這不會起作用。 在我的示例中乘以 2 僅用於說明目的。

Answer 1

您實際上沒有錯誤，而只是警告。 嘗試這個：

def work(df):
    b = df.copy()

    #do some work on the slice and create values for a new column of the slice
    b['b'] = b['a']*2

    #assign the new values back to the slice in a new column
    df['b'] = b['b']
    return df

#pass in a slice of the df with only records that have the last value for 'two'
new_df = work(df.loc[df.index.isin(df.index.get_level_values('two')[-1:], level=1)])

然后：

df.reset_index().merge(new_df, how="left").set_index(["one","two"])

Output：

          a     b
one two     
1   0      0    NaN
    1      1    NaN
    2      2    4.0
2   0      3    NaN
    1      4    NaN
    2      5    10.0

Answer 2

我認為您根本不需要單獨的 function。 嘗試這個...

df['b'] = df['a'].where(df.index.isin(df.index.get_level_values('two')[-1:], level=1))*2

此處在df['a']上調用的Series.where() function 應該返回一個系列，其中對於不是由您的查詢產生的行的值為NaN 。

如何使用多索引將新列分配給 pandas DataFrame 的切片？

問題描述

2 個解決方案

解決方案1
1 已采納 2020-06-11 17:04:36

解決方案2
0 2020-06-11 17:04:02

如何使用多索引將新列分配給 pandas DataFrame 的切片？

問題描述

2 個解決方案

解決方案1 1 已采納 2020-06-11 17:04:36

解決方案2 0 2020-06-11 17:04:02

解決方案1
1 已采納 2020-06-11 17:04:36

解決方案2
0 2020-06-11 17:04:02