简体   繁体   English

根据条件替换数据框列中的值

[英]Replace values in a dataframe column based on condition

I have a seemingly easy task. 我有一个看似简单的任务。 Dataframe with 2 columns: A and B. If values in B are larger than values in A - replace those values with values of A. I used to do this by doing df.B[df.B > df.A] = df.A , however recent upgrade of pandas started giving a SettingWithCopyWarning when encountering this chained assignment. 具有2列的数据帧:A和B.如果B中的值大于A中的值 - 用值A替换这些值。我曾经通过执行df.B[df.B > df.A] = df.A来执行此操作df.B[df.B > df.A] = df.A ,大熊猫然而,最近的升级开始给一个SettingWithCopyWarning遇到此链接分配的情况下。 Official documentation recommends using .loc . 官方文档建议使用.loc

Okay, I said, and did it through df.loc[df.B > df.A, 'B'] = df.A and it all works fine, unless column B has all values of NaN . 好吧,我说,并通过df.loc[df.B > df.A, 'B'] = df.A它并且一切正常,除非B列具有NaN所有值。 Then something weird happens: 然后发生了一些奇怪的事:

In [1]: df = pd.DataFrame({'A': [1, 2, 3],'B': [np.NaN, np.NaN, np.NaN]})

In [2]: df
Out[2]: 
   A   B
0  1 NaN
1  2 NaN
2  3 NaN

In [3]: df.loc[df.B > df.A, 'B'] = df.A

In [4]: df
Out[4]: 
   A                    B
0  1 -9223372036854775808
1  2 -9223372036854775808
2  3 -9223372036854775808

Now, if even one of B's elements satisfies the condition (larger than A), then it all works fine: 现在,如果B中的一个元素满足条件(大于A),那么一切正常:

In [1]: df = pd.DataFrame({'A': [1, 2, 3],'B': [np.NaN, 4, np.NaN]})

In [2]: df
Out[2]: 
   A   B
0  1 NaN
1  2   4
2  3 NaN

In [3]: df.loc[df.B > df.A, 'B'] = df.A

In [4]: df
Out[4]: 
   A   B
0  1 NaN
1  2   2
2  3 NaN

But if none of Bs elements satisfy, then all NaN s get replaces with -9223372036854775808 : 但如果没有Bs元素满足,那么所有NaN都会替换为-9223372036854775808

In [1]: df = pd.DataFrame({'A':[1,2,3],'B':[np.NaN,1,np.NaN]})

In [2]: df
Out[2]: 
   A   B
0  1 NaN
1  2   1
2  3 NaN

In [3]: df.loc[df.B > df.A, 'B'] = df.A

In [4]: df
Out[4]: 
   A                    B
0  1 -9223372036854775808
1  2                    1
2  3 -9223372036854775808

Is this a bug or a feature? 这是一个错误还是一个功能? How should I have done this replacement? 我该怎么做这个替换?

Thank you! 谢谢!

This is a buggie, fixed here . 这是一个在这里修复的buggie。

Since pandas allows basically anything to be set on the right-hand-side of an expression in loc, there are probably 10+ cases that need to be disambiguated. 由于pandas基本上允许在loc的表达式的右侧设置任何内容,因此可能需要消除10个以上的情况需要消除歧义。 To give you an idea: 给你一个想法:

df.loc[lhs, column] = rhs 

where rhs could be: list,array,scalar , and lhs could be: slice,tuple,scalar,array 其中rhs可以是: list,array,scalar和lhs可以是: slice,tuple,scalar,array

and a small subset of cases where the resulting dtype of the column needs to be inferred / set according to the rhs. 以及需要根据rhs推断/设置得到的列的dtype的一小部分情况。 (This is a bit complicated). (这有点复杂)。 For example say you don't set all of the elements on the lhs and it was integer, then you need to coerce to float. 例如,假设你没有在lhs上设置所有元素并且它是整数,那么你需要强制浮动。 But if you did set all of the elements AND the rhs was an integer then it needs to be coerced BACK to integer. 但是如果你确实设置了所有元素并且rhs是一个整数,那么它需要被强制转换为整数。

In this this particular case, the lhs is an array, so we would normally try to coerce the lhs to the type of the rhs, but this case degenerates if we have an unsafe conversion (int -> float) 在这个特殊情况下,lhs是一个数组,所以我们通常会尝试将lhs强制转换为rhs的类型,但如果我们有一个不安全的转换(int - > float),这种情况就会退化

Suffice to say this was a missing edge case. 我只想说这是一个缺失的边缘案例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据条件替换列中的值,然后返回数据框 - Replace values in column based on condition, then return dataframe 如何根据第二个 Dataframe 值的条件替换 Dataframe 列值 - How to Replace Dataframe Column Values Based on Condition of Second Dataframe Values 如何根据具有一系列值的条件替换 pd 数据框列中的值? - How to Replace values in a pd dataframe column based on a condition with a range of values? 根据条件从另一个 dataframe 值替换列的值 - Python - Replace values of a column from another dataframe values based on a condition - Python 根据条件,用相应的列名替换 pandas 数据框中的特定值, - Replace specific values in pandas dataframe with the corresponding column name, based on a condition, 如何根据条件用NaN替换数据框列值? - How to replace a dataframe column values with NaN based on a condition? 根据条件使用字典替换 python dataframe 中的列值 - Replace column values in python dataframe using dictionary based on condition 根据条件用不同的替换字典替换熊猫数据框列中的值 - Replace values in pandas dataframe column with different replacement dict based on condition Pandas DataFrame:根据条件替换列中的所有值 - Pandas DataFrame: replace all values in a column, based on condition 如何使用基于条件的值将 append 列到 dataframe - How to append a column to a dataframe with values based on condition
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM