简体   繁体   English

带有 df.loc 的 Pandas/Python 中的 SettingWithCopyWarning 消息

[英]SettingWithCopyWarning message in Pandas/Python with df.loc

OBS: I've spent a few hours searching in SO, Pandas docs and a few others websites, but couldnt understand where my code isnt working. OBS:我花了几个小时搜索 SO、Pandas 文档和其他一些网站,但无法理解我的代码在哪里工作。

My UDF:我的UDF:

def indice(dfb, lb, ub):
    dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
    dfb = dfb[~dfb.isOutlier]

    dfb['indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
    df = df.astype({'indice': 'int64'})
    return dfb

Important:重要的:

  • isOutlier column does not exist . isOutlier不存在 I'm creating it right now in this function.我现在正在这个 function 中创建它。
  • indice column does not exist . indice不存在 I'm creating it right now in this function.我现在正在这个 function 中创建它。
  • valor_unitario exists and its a float valor_unitario存在并且它是一个浮点数
  • lb and ub are previously defined lbub是之前定义的
  • This function is inside a loop in the main code (but this warning is raised since n=0)此 function 位于主代码的循环内(但由于 n=0 会引发此警告)

Warning raised发出警告

C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

I found a few articles and questions on web and also StackOverflow saying that using loc would solve the problem.我在 web 和 StackOverflow 上发现了一些文章和问题,说使用loc可以解决问题。 I tried but with no success我试过但没有成功

1º try - Using loc 1º 尝试 - 使用 loc

def indice(dfb, lb, ub):
->  dfb.loc[:,'isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
    dfb = dfb[~dfb.isOutlier]

->  dfb.loc[:,'indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
    df = df.astype({'indice': 'int64'})
    return dfb

I also tried to use loc each one each time actually, I tried a lot of possible combinations... Tried to use df.loc in dfb['valor_unitario'] and so on实际上,我也尝试过每次都使用 loc ,我尝试了很多可能的组合...尝试在dfb['valor_unitario']中使用df.loc等等

Now I have the same warning, twice, but a bit different:现在我有同样的警告,两次,但有点不同:

self._setitem_single_column(ilocs[0], value, pi) and self.obj[key] = value self._setitem_single_column(ilocs[0], value, pi)self.obj[key] = value

C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1676: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
-> self._setitem_single_column(ilocs[0], value, pi)

and

C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1597: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
-> self.obj[key] = value

I also tried using copy.我也尝试使用副本。 At first time this warning shown up, simple using copy() solved the problem, I dont know why now its not working (I just loaded more data)第一次出现这个警告,简单的使用copy()解决了这个问题,我不知道为什么现在它不起作用(我只是加载了更多数据)

2º Try - Using copy() 2º 尝试 - 使用 copy()

I tried to place copy() in three places, with no sucess我试图将copy()放在三个地方,但没有成功

dfb = dfb[~dfb.isOutlier].copy()

dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub).copy()

dfb['isOutlier'] = ~dfb['valor_unitario'].copy().between(lb, ub)

C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

I have no more ideas, would appreciate a lot your support.我没有更多的想法,非常感谢您的支持。

------- Minimun Reproducible Example -------- -------- 最小可重现示例 --------

Main_testing.py main_testing.py

import pandas as pd
import calculoindice_support as indice # module 01
import getitemsid_support as getitems # module 02

df = pd.DataFrame({'loja':[1,4,6,6,4,5,7,8],
                   'cod_produto':[21,21,21,55,55,43,26,30],
                   'valor_unitario':[332.21,333.40,333.39,220.40,220.40,104.66,65.00,14.00],
                   'documento':['324234','434144','532552','524523','524525','423844','529585','239484'],
                   'empresa':['ABC','ABC','ABC','ABC','ABC','CDE','CDE','CDE']
                   })

nome_coluna = 'cod_produto'
# getting items id to loop over them
product_ids = getitems.getitemsid(df, nome_coluna)

# initializing main DF with no data 
df_nf = pd.DataFrame(columns=list(df.columns.values))

n = 0
while n < len(product_ids):
    item = product_ids[n]
    df_item = df[df[nome_coluna] == item]
    # assigning bounds to each variable
    lb, ub = indice.limites(df_item, 10)
    # calculating index over DF, using LB and UB
    # creating temporary (for each loop) DF
    df_nf_aux = indice.indice(df_item, lb, ub)
    # assigning temporary DF to main DF that will be exported later
    df_nf = pd.concat([df_nf, df_nf_aux],ignore_index=True)
    n += 1

calculoindice_support.py (module 01) calculoindice_support.py(模块 01)

import pandas as pd

def limites(dfa,n):
    n_sigma = n * dfa.valor_unitario.std()
    mean = dfa.valor_unitario.mean()
    lb: float = mean - n_sigma
    ub: float = mean + n_sigma
    return (lb, ub)


def indice(dfb, lb, ub):
    if lb == ub:
        dfb.loc[:, 'isOutlier'] = False
        dfb.loc[:, 'indice'] = 1
    else:
        dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
        dfb = dfb[~dfb.isOutlier]

        dfb['indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
        # df = df.astype({'indice': 'int64'})

    return dfb

getitemsid_support.py (module 02) getitemsid_support.py(模块 02)

def getitemsid(df, coluna):
    a = df[coluna].tolist()
    return list(set(a))

Warning output:警告 output:

C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1597: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = value
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1720: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

Problem is in your Main_testing.py问题出在您的Main_testing.py

while n < len(product_ids):
    df_item = df[df[nome_coluna] == item]

    df_nf_aux = indice.indice(df_item, lb, ub)

First you slice your df with condition df[nome_coluna] == item ,this will return a copy of dataframe(You can check this by accessing _is_view or _is_copy attribute).首先你用条件df[nome_coluna] == item切片你的df ,这将返回一个数据帧的副本(你可以通过访问_is_view_is_copy属性来检查这个)。 Then you pass that filtered dataframe to indice method.然后将过滤后的indice传递给索引方法。

def indice(dfb, lb, ub):
    dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

In indice method, you assign a new column to the filtered dataframe.在索引方法中,您将新列分配给过滤后的indice This is an implicit chained assignment.这是一个隐式链式赋值。 Pandas don't know if you want to add the new column to the original dataframe or only add to the filtered dataframe, so pandas gives you a warning. Pandas don't know if you want to add the new column to the original dataframe or only add to the filtered dataframe, so pandas gives you a warning.

To suppress this warning, you can explicitly tell pandas what you want to do要抑制此警告,您可以明确告诉 pandas 您要做什么

def indice(dfb, lb, ub):
    dfb = dfb.copy()
    dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

In the above case, I create a copy of filtered dataframe.在上述情况下,我创建了过滤后的 dataframe 的副本。 This means I would like to add the new column to the filtered dataframe not original.这意味着我想将新列添加到过滤后的 dataframe 不是原始的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM