![](/img/trans.png)
[英]How to slice a Pandas DataFrame with a MultiIndex index and a MultiIndex column?
[英]How can I assign a new column to a slice of a pandas DataFrame with a multiindex?
我有一個 pandas DataFrame 具有這樣的多索引:
import pandas as pd
import numpy as np
arr = [1]*3 + [2]*3
arr2 = list(range(3)) + list(range(3))
mux = pd.MultiIndex.from_arrays([
arr,
arr2
], names=['one', 'two'])
df = pd.DataFrame({'a': np.arange(len(mux))}, mux)
df
a
one two
1 0 0
1 1 1
1 2 2
2 0 3
2 1 4
2 2 5
我有一個 function 需要一個 DataFrame 的切片,並且需要為已切片的行分配一個新列:
def work(df):
b = df.copy()
#do some work on the slice and create values for a new column of the slice
b['b'] = b['a']*2
#assign the new values back to the slice in a new column
df['b'] = b['b']
#pass in a slice of the df with only records that have the last value for 'two'
work(df.loc[df.index.isin(df.index.get_level_values('two')[-1:], level=1)])
但是調用 function 會導致錯誤:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
# This is added back by InteractiveShellApp.init_path()
如何在原始 DataFrame 中創建一個新列“b”,並僅將其值分配給傳遞給 function 的行,留下 rest 的行?
所需的 output 是:
a b
one two
1 0 0 nan
1 1 1 nan
1 2 2 4
2 0 3 nan
2 1 4 nan
2 2 5 10
注意:在工作 function 中,我實際上正在執行一系列復雜的操作,包括調用其他函數來生成新列的值,所以我認為這不會起作用。 在我的示例中乘以 2 僅用於說明目的。
您實際上沒有錯誤,而只是警告。 嘗試這個:
def work(df):
b = df.copy()
#do some work on the slice and create values for a new column of the slice
b['b'] = b['a']*2
#assign the new values back to the slice in a new column
df['b'] = b['b']
return df
#pass in a slice of the df with only records that have the last value for 'two'
new_df = work(df.loc[df.index.isin(df.index.get_level_values('two')[-1:], level=1)])
然后:
df.reset_index().merge(new_df, how="left").set_index(["one","two"])
Output:
a b
one two
1 0 0 NaN
1 1 NaN
2 2 4.0
2 0 3 NaN
1 4 NaN
2 5 10.0
我認為您根本不需要單獨的 function。 嘗試這個...
df['b'] = df['a'].where(df.index.isin(df.index.get_level_values('two')[-1:], level=1))*2
此處在df['a']
上調用的Series.where()
function 應該返回一個系列,其中對於不是由您的查詢產生的行的值為NaN
。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.