简体   繁体   English

带有 np.where 的 Pandas SettingWithCopyWarning

[英]Pandas SettingWithCopyWarning with np.where

I have a pandas data frame that has a column containing a bunch of correlations (all float values).我有一个熊猫数据框,其中有一列包含一堆相关性(所有浮点值)。 I'm trying to create another column to categorise these correlations into three distinct categories (high/medium/low).我正在尝试创建另一列,将这些相关性分为三个不同的类别(高/中/低)。 I do this using np.where:我使用 np.where 执行此操作:

df['Category'] = np.where(df['Correlation'] >= 0.5, 'high', 
                                   np.where(data['Correlation'] >= 0.3, 'medium','low'))

When I try doing this, I always get the SettingWithCopyWarning (it seems to work though).当我尝试这样做时,我总是得到SettingWithCopyWarning (虽然它似乎有效)。 I have read up on the difference between copies and views, and even seen recommendations to use .where over other methods to avoid any confusion (and the SettingWithCopyWarning).我已经阅读了副本和视图之间的区别,甚至看到了使用 .where 而不是其他方法以避免任何混淆的建议(以及 SettingWithCopyWarning)。 I still can't quite wrap my head around why I get the warning with this method, can someone explain?我仍然无法完全理解为什么我会用这种方法收到警告,有人可以解释一下吗?

Most likely your df has been created as a view of another DataFrame, eg:很可能您的df已创建为另一个 DataFrame 的视图,例如:

data = pd.DataFrame({'Correlation': np.arange(0, 1.3, 0.1)})  # Your "initial" DataFrame
df = data.iloc[0:11]

Now df holds some fragment of data , but it uses the data buffer of data .现在df保存一些数据片段,但它使用数据的数据缓冲区。

Then if you attempt to run:然后,如果您尝试运行:

df['Category'] = np.where(df['Correlation'] >= 0.5, 'high',
    np.where(df['Correlation'] >= 0.3, 'medium', 'low'))

just the mentioned warning occurs.只是出现了提到的警告。

To get rid of it, create df as an independent DataFrame, eg:要摆脱它,请将df创建为独立的 DataFrame,例如:

df = data.iloc[0:11].copy()

Now df uses its own data buffer and you may do with it whatever you wish, including adding new columns.现在df使用它自己的数据缓冲区,您可以随心所欲地使用它,包括添加新列。

To check whether your df uses its own data buffer, run:要检查您的df是否使用自己的数据缓冲区,请运行:

df._is_view

In your original environment (without my correction) you should get False , but after you created df using .copy() you should get True .在您的原始环境中(未经我的更正),您应该得到False ,但是在使用.copy()创建df之后,您应该得到True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM