pandas 相当于 np.where

Question

np.where has the semantics of a vectorized if/else (similar to Apache Spark's when / otherwise DataFrame method). np.where具有矢量化 if/else 的语义（类似于 Apache Spark 的when / otherwise DataFrame 方法）。 I know that I can use np.where on pandas.Series , but pandas often defines its own API to use instead of raw numpy functions, which is usually more convenient with pd.Series / pd.DataFrame .我知道我可以在np.where上使用pandas.Series ，但是pandas通常定义自己的 API 来代替原始的numpy函数，这通常使用pd.Series / pd.DataFrame更方便。

Sure enough, I found pandas.DataFrame.where .果然找到pandas.DataFrame.where 。 However, at first glance, it has completely different semantics.然而，乍一看，它具有完全不同的语义。 I could not find a way to rewrite the most basic example of np.where using pandas where :我找不到使用 pandas 重写np.where最基本示例的方法where ：

# df is pd.DataFrame
# how to write this using df.where?
df['C'] = np.where((df['A']<0) | (df['B']>0), df['A']+df['B'], df['A']/df['B'])

Am I missing something obvious?我错过了一些明显的东西吗？ Or is pandas' where intended for a completely different use case, despite same name as np.where ?或者，尽管与np.where同名，但 pandas 的where是否适用于完全不同的用例？

Answer 1

Try:尝试：

(df['A'] + df['B']).where((df['A'] < 0) | (df['B'] > 0), df['A'] / df['B'])

The difference between the numpy where and DataFrame where is that the default values are supplied by the DataFrame that the where method is being called on ( docs ). numpy where和DataFrame where之间的区别在于，默认值由DataFrame where方法的DataFrame提供（ docs ）。

Ie IE

np.where(m, A, B)

is roughly equivalent to大致相当于

A.where(m, B)

If you wanted a similar call signature using pandas, you could take advantage of the way method calls work in Python :如果您想要使用 Pandas 的类似调用签名，您可以利用Python 中方法调用的工作方式：

pd.DataFrame.where(cond=(df['A'] < 0) | (df['B'] > 0), self=df['A'] + df['B'], other=df['A'] / df['B'])

or without kwargs (Note: that the positional order of arguments is different from the numpy where argument order ):或不使用 kwargs（注意：参数的位置顺序与numpy where 参数顺序不同）：

pd.DataFrame.where(df['A'] + df['B'], (df['A'] < 0) | (df['B'] > 0), df['A'] / df['B'])

Answer 2

I prefer using pandas' mask over where since it is less counterintuitive (at least for me).我更喜欢使用pandas 的mask而不是where因为它不那么违反直觉（至少对我而言）。

(df['A']/df['B']).mask(df['A']<0) | (df['B']>0), df['A']+df['B'])

Here, column A and B are added where the condition holds, otherwise their ratio stays untouched.此处，在条件成立的地方添加A列和B列，否则它们的比率保持不变。

pandas 相当于 np.where

问题描述

2 个解决方案

解决方案1
61 已采纳 2016-07-26 01:15:44

解决方案2
2 2022-05-26 19:39:43

pandas 相当于 np.where

问题描述

2 个解决方案

解决方案1 61 已采纳 2016-07-26 01:15:44

解决方案2 2 2022-05-26 19:39:43

解决方案1
61 已采纳 2016-07-26 01:15:44

解决方案2
2 2022-05-26 19:39:43