简体   繁体   中英

Passing DataFrame slice as argument to function without 'SettingWithCopyWarning'

I have a function that takes dataframe as an argument and while processing this dataframe it calls another function passing a slice of the same dataframe as an argument to the secondary function.

All changes are done in place so nothing is returned (because of the size the dataframe).

But, this secondary function raises SettingWithCopyWarning since it does not deal with the original dataframe anymore.

Here is an example:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(9).reshape(3,3), columns=list('abc'))
print df

def a(df):
    if df.is_copy:
        print 'a got a copy'
    df['a'] = 'a'

def b(df):
    if df.is_copy:
        print 'b got a copy'
        print df.is_copy
    df.loc[:,'b'] = 'b'

def c(df):
    a(df)
    b(df.loc[0:1,:])
    if df.is_copy:
        print 'c got a copy'
    df.loc[0:1,'c'] = 'c'

def d(df):
    new_df = df.loc[0:1].copy(deep=True)
    b(new_df)
    df.update(new_df)
    del new_df

c(df)
df

Results in:

b got a copy
<weakref at 000000000C1DE778; to 'DataFrame' at 000000000C1B9DA0>

   a  b  c
0  a  1  c
1  a  4  c
2  a  7  8

I understand that one option is to create a new dataframe from the slice of the original and to pass it to b and then df.update(new_df) and d shows that it works:

d(df)
df

Produces the desired output:

   a  b  c
0  a  b  c
1  a  b  c
2  a  7  8

But is there a way to deal with this without creating new dataframe and raising SettingWithCopyWarning .

Another complication is that call to b from within c sometimes might be just simple b(df) , so slicing is optional.

Thank you.

If you want to modify things it is much better to simply pass the frame and a mask around.

def b(df, row_mask):
    df.loc[row_mask,'b'] = 'foo'

Though usually I wouldn't modify things like this especially if its a big frame. These modifications trigger a copy when you are changing dtypes (eg putting 'b' in a column with all numbers is normally not something you should do, dtypes are column based).

So a better workflow is to do this:

def b(df):
    sliced = df.loc[0:1].copy()
    sliced.loc[:,'b'] = 'foo'
    return sliced

Then you can simply concetanate at the end:

result = pd.concat([b(df), df[1:]])

then produce a chain of these and concat all at once. Will be much more efficient then modifying in-place (though if you are only modifying a small number of values then my 1st method might work better). YMMV.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM