简体   繁体   中英

SettingWithCopyWarning in Pandas DataFrame using Python

My task is to drop all rows containing NaNs and encode all the categorical variables inside of data.

I wrote a function that looks like

def preprocess_data(data):

    data = data.dropna()
    le = LabelEncoder()
    data['car name'] = le.fit_transform(data['car name'])

    return data

which takes a dataframe and returns a processed data. Running this function gives me a warning that says:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I don't quite get which part of my code is causing this and how to fix it.

Make sure you tell pandas that data it is its own data frame (and not a slice) by using:

def preprocess_data(data):

    data = data.dropna().copy()
    le = LabelEncoder()
    data['car name'] = le.fit_transform(data['car name'])

    return data

A more detailed explanation here: https://github.com/pandas-dev/pandas/issues/17476

Maybe you should give more information and / or the problem is not in the method. The following code does not produce warning.

def preprocess_data(data):

    data = data.dropna()
    le = preprocessing.LabelEncoder()
    data['car name'] = le.fit_transform(data['car name'])
    return data


preprocess_data(pd.DataFrame({'car name': ['nissan', 'dacia'], 'car mode': ['juke', 'logan']}))

#   car mode  car name
# 0     juke         1
# 1    logan         0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM