[英]SettingWithCopyWarning message in Pandas/Python with df.loc
OBS: I've spent a few hours searching in SO, Pandas docs and a few others websites, but couldnt understand where my code isnt working. OBS:我花了几个小时搜索 SO、Pandas 文档和其他一些网站,但无法理解我的代码在哪里工作。
def indice(dfb, lb, ub):
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
dfb = dfb[~dfb.isOutlier]
dfb['indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
df = df.astype({'indice': 'int64'})
return dfb
isOutlier
column does not exist . isOutlier
列不存在。 I'm creating it right now in this function.indice
column does not exist . indice
列不存在。 I'm creating it right now in this function.valor_unitario
exists and its a float valor_unitario
存在并且它是一个浮点数lb
and ub
are previously defined lb
和ub
是之前定义的C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
I found a few articles and questions on web and also StackOverflow saying that using loc
would solve the problem.我在 web 和 StackOverflow 上发现了一些文章和问题,说使用
loc
可以解决问题。 I tried but with no success我试过但没有成功
def indice(dfb, lb, ub):
-> dfb.loc[:,'isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
dfb = dfb[~dfb.isOutlier]
-> dfb.loc[:,'indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
df = df.astype({'indice': 'int64'})
return dfb
I also tried to use loc each one each time actually, I tried a lot of possible combinations... Tried to use df.loc
in dfb['valor_unitario']
and so on实际上,我也尝试过每次都使用 loc ,我尝试了很多可能的组合...尝试在
dfb['valor_unitario']
中使用df.loc
等等
Now I have the same warning, twice, but a bit different:现在我有同样的警告,两次,但有点不同:
self._setitem_single_column(ilocs[0], value, pi)
and self.obj[key] = value
self._setitem_single_column(ilocs[0], value, pi)
和self.obj[key] = value
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1676: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
-> self._setitem_single_column(ilocs[0], value, pi)
and和
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1597: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
-> self.obj[key] = value
I also tried using copy.我也尝试使用副本。 At first time this warning shown up, simple using
copy()
solved the problem, I dont know why now its not working (I just loaded more data)第一次出现这个警告,简单的使用
copy()
解决了这个问题,我不知道为什么现在它不起作用(我只是加载了更多数据)
I tried to place copy()
in three places, with no sucess我试图将
copy()
放在三个地方,但没有成功
dfb = dfb[~dfb.isOutlier].copy()
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub).copy()
dfb['isOutlier'] = ~dfb['valor_unitario'].copy().between(lb, ub)
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
I have no more ideas, would appreciate a lot your support.我没有更多的想法,非常感谢您的支持。
import pandas as pd
import calculoindice_support as indice # module 01
import getitemsid_support as getitems # module 02
df = pd.DataFrame({'loja':[1,4,6,6,4,5,7,8],
'cod_produto':[21,21,21,55,55,43,26,30],
'valor_unitario':[332.21,333.40,333.39,220.40,220.40,104.66,65.00,14.00],
'documento':['324234','434144','532552','524523','524525','423844','529585','239484'],
'empresa':['ABC','ABC','ABC','ABC','ABC','CDE','CDE','CDE']
})
nome_coluna = 'cod_produto'
# getting items id to loop over them
product_ids = getitems.getitemsid(df, nome_coluna)
# initializing main DF with no data
df_nf = pd.DataFrame(columns=list(df.columns.values))
n = 0
while n < len(product_ids):
item = product_ids[n]
df_item = df[df[nome_coluna] == item]
# assigning bounds to each variable
lb, ub = indice.limites(df_item, 10)
# calculating index over DF, using LB and UB
# creating temporary (for each loop) DF
df_nf_aux = indice.indice(df_item, lb, ub)
# assigning temporary DF to main DF that will be exported later
df_nf = pd.concat([df_nf, df_nf_aux],ignore_index=True)
n += 1
import pandas as pd
def limites(dfa,n):
n_sigma = n * dfa.valor_unitario.std()
mean = dfa.valor_unitario.mean()
lb: float = mean - n_sigma
ub: float = mean + n_sigma
return (lb, ub)
def indice(dfb, lb, ub):
if lb == ub:
dfb.loc[:, 'isOutlier'] = False
dfb.loc[:, 'indice'] = 1
else:
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
dfb = dfb[~dfb.isOutlier]
dfb['indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
# df = df.astype({'indice': 'int64'})
return dfb
def getitemsid(df, coluna):
a = df[coluna].tolist()
return list(set(a))
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1597: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self.obj[key] = value
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1720: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_column(loc, value, pi)
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
Problem is in your Main_testing.py
问题出在您的
Main_testing.py
中
while n < len(product_ids):
df_item = df[df[nome_coluna] == item]
df_nf_aux = indice.indice(df_item, lb, ub)
First you slice your df
with condition df[nome_coluna] == item
,this will return a copy of dataframe(You can check this by accessing _is_view
or _is_copy
attribute).首先你用条件
df[nome_coluna] == item
切片你的df
,这将返回一个数据帧的副本(你可以通过访问_is_view
或_is_copy
属性来检查这个)。 Then you pass that filtered dataframe to indice
method.然后将过滤后的
indice
传递给索引方法。
def indice(dfb, lb, ub):
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
In indice
method, you assign a new column to the filtered dataframe.在索引方法中,您将新列分配给过滤后的
indice
。 This is an implicit chained assignment.这是一个隐式链式赋值。 Pandas don't know if you want to add the new column to the original dataframe or only add to the filtered dataframe, so pandas gives you a warning.
Pandas don't know if you want to add the new column to the original dataframe or only add to the filtered dataframe, so pandas gives you a warning.
To suppress this warning, you can explicitly tell pandas what you want to do要抑制此警告,您可以明确告诉 pandas 您要做什么
def indice(dfb, lb, ub):
dfb = dfb.copy()
dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
In the above case, I create a copy of filtered dataframe.在上述情况下,我创建了过滤后的 dataframe 的副本。 This means I would like to add the new column to the filtered dataframe not original.
这意味着我想将新列添加到过滤后的 dataframe 不是原始的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.