简体   繁体   中英

A value is trying to be set on a copy of a slice from a DataFrame. - pandas

I'm new to pandas , and, given a data frame, I was trying to drop some columns that don't accomplish an specific requirement. Researching how to do it, I got to this structure:

df = df.loc[df['DS_FAMILIA_PROD'].isin(['CARTOES', 'CARTÕES'])]

However, when processing the frame, I get this error:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value

I'm not sure about what to do because I'm already using the .loc function. What am I missing?

f = ['ID_manifest', 'issue_date', 'channel', 'product', 'ID_client', 'desc_manifest']

df = pd.DataFrame(columns=f)
for chunk in df2017_chunks:
    aux = preProcess(chunk, f)
    df = pd.concat([df, aux])

def preProcess(df, f):    
    stops = list(stopwords.words("portuguese"))
    stops.extend(['reclama', 'cliente', 'santander', 'cartao', 'cartão'])

    df = df.loc[df['DS_FAMILIA_PROD'].isin(['CARTOES', 'CARTÕES'])]

    df.columns = f
    df.desc_manifest = df.desc_manifest.str.lower() # All lower case
    df.desc_manifest = df.desc_manifest.apply(lambda x: re.sub('[^A-zÀ-ÿ]', ' ', str(x))) # Just letters
    df.replace(['NaN', 'nan'], np.nan, inplace = True) # Remone nan
    df.dropna(subset=['desc_manifest'], inplace=True)
    df.desc_manifest = df.desc_manifest.apply(lambda x: [word for word in str(x).split() if word not in stops]) # Remove stop words

    return df

The purpose of the warning is to show users that they may be operating on a copy and not the original but there can be False positives. As mentioned in the comments, this is not an issue for your use case.

You can simply turn off the check for your dataframe:

df.is_copy = False

or you can explicitly copy:

df = df.loc[df['DS_FAMILIA_PROD'].isin(['CARTOES', 'CARTÕES'])].copy()

You need copy , because if you modify values in df later you will find that the modifications do not propagate back to the original data ( df ), and that Pandas does warning.

loc can be omit, but warning without copy too.

df = pd.DataFrame({'DS_FAMILIA_PROD':['a','d','b'],
                   'desc_manifest':['F','rR', 'H'],
                   'C':[7,8,9]})

def preProcess(df):    
    df = df[df['DS_FAMILIA_PROD'].isin([u'a', u'b'])].copy()
    df.desc_manifest = df.desc_manifest.str.lower() # All
    ...
    ...
    return df


print (preProcess(df))
   C DS_FAMILIA_PROD desc_manifest
0  7               a             f
2  9               b             h

If your program intends to take a copy of the df on purpose, you can stop the warning with this:

pd.set_option('mode.chained_assignment', None)
pd.set_option('mode.chained_assignment', 'warn')
# if you set a value on a copy, warning will show

df = DataFrame({'DS_FAMILIA_PROD' : [1, 2, 3], 'COL2' : [5, 6, 7]})
df = df[df.DS_FAMILIA_PROD.isin([1, 2])]
df
Out[29]: 
   COL2  DS_FAMILIA_PROD
0     5                1
1     6                2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM