[英]How to slice a Pandas DataFrame with a MultiIndex index and a MultiIndex column?
[英]How can I assign a new column to a slice of a pandas DataFrame with a multiindex?
我有一个 pandas DataFrame 具有这样的多索引:
import pandas as pd
import numpy as np
arr = [1]*3 + [2]*3
arr2 = list(range(3)) + list(range(3))
mux = pd.MultiIndex.from_arrays([
arr,
arr2
], names=['one', 'two'])
df = pd.DataFrame({'a': np.arange(len(mux))}, mux)
df
a
one two
1 0 0
1 1 1
1 2 2
2 0 3
2 1 4
2 2 5
我有一个 function 需要一个 DataFrame 的切片,并且需要为已切片的行分配一个新列:
def work(df):
b = df.copy()
#do some work on the slice and create values for a new column of the slice
b['b'] = b['a']*2
#assign the new values back to the slice in a new column
df['b'] = b['b']
#pass in a slice of the df with only records that have the last value for 'two'
work(df.loc[df.index.isin(df.index.get_level_values('two')[-1:], level=1)])
但是调用 function 会导致错误:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
# This is added back by InteractiveShellApp.init_path()
如何在原始 DataFrame 中创建一个新列“b”,并仅将其值分配给传递给 function 的行,留下 rest 的行?
所需的 output 是:
a b
one two
1 0 0 nan
1 1 1 nan
1 2 2 4
2 0 3 nan
2 1 4 nan
2 2 5 10
注意:在工作 function 中,我实际上正在执行一系列复杂的操作,包括调用其他函数来生成新列的值,所以我认为这不会起作用。 在我的示例中乘以 2 仅用于说明目的。
您实际上没有错误,而只是警告。 尝试这个:
def work(df):
b = df.copy()
#do some work on the slice and create values for a new column of the slice
b['b'] = b['a']*2
#assign the new values back to the slice in a new column
df['b'] = b['b']
return df
#pass in a slice of the df with only records that have the last value for 'two'
new_df = work(df.loc[df.index.isin(df.index.get_level_values('two')[-1:], level=1)])
然后:
df.reset_index().merge(new_df, how="left").set_index(["one","two"])
Output:
a b
one two
1 0 0 NaN
1 1 NaN
2 2 4.0
2 0 3 NaN
1 4 NaN
2 5 10.0
我认为您根本不需要单独的 function。 尝试这个...
df['b'] = df['a'].where(df.index.isin(df.index.get_level_values('two')[-1:], level=1))*2
此处在df['a']
上调用的Series.where()
function 应该返回一个系列,其中对于不是由您的查询产生的行的值为NaN
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.