[英]Passing DataFrame slice as argument to function without 'SettingWithCopyWarning'
I have a function that takes dataframe as an argument and while processing this dataframe it calls another function passing a slice of the same dataframe as an argument to the secondary function. 我有一个以数据框为参数的函数,在处理此数据框时,它调用另一个函数,该函数将同一数据框的一部分作为参数传递给辅助函数。
All changes are done in place so nothing is returned (because of the size the dataframe). 所有更改均已就位,因此不会返回任何内容(由于数据框的大小)。
But, this secondary function raises SettingWithCopyWarning
since it does not deal with the original dataframe anymore. 但是,此辅助功能引发了
SettingWithCopyWarning
因为它不再处理原始数据帧。
Here is an example: 这是一个例子:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(9).reshape(3,3), columns=list('abc'))
print df
def a(df):
if df.is_copy:
print 'a got a copy'
df['a'] = 'a'
def b(df):
if df.is_copy:
print 'b got a copy'
print df.is_copy
df.loc[:,'b'] = 'b'
def c(df):
a(df)
b(df.loc[0:1,:])
if df.is_copy:
print 'c got a copy'
df.loc[0:1,'c'] = 'c'
def d(df):
new_df = df.loc[0:1].copy(deep=True)
b(new_df)
df.update(new_df)
del new_df
c(df)
df
Results in: 结果是:
b got a copy
<weakref at 000000000C1DE778; to 'DataFrame' at 000000000C1B9DA0>
a b c
0 a 1 c
1 a 4 c
2 a 7 8
I understand that one option is to create a new dataframe from the slice of the original and to pass it to b
and then df.update(new_df)
and d
shows that it works: 我知道一种选择是从原始切片创建一个新的数据帧,然后将其传递给
b
,然后传递给df.update(new_df)
和d
表明它可以正常工作:
d(df)
df
Produces the desired output: 产生所需的输出:
a b c
0 a b c
1 a b c
2 a 7 8
But is there a way to deal with this without creating new dataframe and raising SettingWithCopyWarning
. 但是有没有一种方法可以解决此问题,而无需创建新的数据
SettingWithCopyWarning
和提高SettingWithCopyWarning
。
Another complication is that call to b
from within c
sometimes might be just simple b(df)
, so slicing is optional. 另一个复杂之处是,有时从
c
中调用b
可能只是简单的b(df)
,所以切片是可选的。
Thank you. 谢谢。
If you want to modify things it is much better to simply pass the frame and a mask around. 如果要修改内容,最好只是传递框架和遮罩。
def b(df, row_mask):
df.loc[row_mask,'b'] = 'foo'
Though usually I wouldn't modify things like this especially if its a big frame. 虽然通常我不会修改这样的事情,尤其是如果它的框架很大。 These modifications trigger a copy when you are changing dtypes (eg putting 'b' in a column with all numbers is normally not something you should do, dtypes are column based).
当您更改dtypes时,这些修改会触发一个副本(例如,通常不应该将“ b”放入所有数字的列中,dtypes是基于列的)。
So a better workflow is to do this: 因此,更好的工作流程是:
def b(df):
sliced = df.loc[0:1].copy()
sliced.loc[:,'b'] = 'foo'
return sliced
Then you can simply concetanate at the end: 然后,您可以在结尾处简单地包容:
result = pd.concat([b(df), df[1:]])
then produce a chain of these and concat all at once. 然后产生一个这样的链,并立即连接在一起。 Will be much more efficient then modifying in-place (though if you are only modifying a small number of values then my 1st method might work better).
与就地修改相比,效率会更高(尽管如果您只修改少量值,那么我的第一种方法可能会更好地工作)。 YMMV.
YMMV。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.