[英]pandas equivalent of np.where
np.where
has the semantics of a vectorized if/else (similar to Apache Spark's when
/ otherwise
DataFrame method). np.where
具有矢量化 if/else 的语义(类似于 Apache Spark 的when
/ otherwise
DataFrame 方法)。 I know that I can use np.where
on pandas.Series
, but pandas
often defines its own API to use instead of raw numpy
functions, which is usually more convenient with pd.Series
/ pd.DataFrame
.我知道我可以在
np.where
上使用pandas.Series
,但是pandas
通常定义自己的 API 来代替原始的numpy
函数,这通常使用pd.Series
/ pd.DataFrame
更方便。
Sure enough, I found pandas.DataFrame.where
.果然找到
pandas.DataFrame.where
。 However, at first glance, it has completely different semantics.然而,乍一看,它具有完全不同的语义。 I could not find a way to rewrite the most basic example of
np.where
using pandas where
:我找不到使用 pandas 重写
np.where
最基本示例的方法where
:
# df is pd.DataFrame
# how to write this using df.where?
df['C'] = np.where((df['A']<0) | (df['B']>0), df['A']+df['B'], df['A']/df['B'])
Am I missing something obvious?我错过了一些明显的东西吗? Or is pandas'
where
intended for a completely different use case, despite same name as np.where
?或者,尽管与
np.where
同名,但 pandas 的where
是否适用于完全不同的用例?
Try:尝试:
(df['A'] + df['B']).where((df['A'] < 0) | (df['B'] > 0), df['A'] / df['B'])
The difference between the numpy
where
and DataFrame
where
is that the default values are supplied by the DataFrame
that the where
method is being called on ( docs ). numpy
where
和DataFrame
where
之间的区别在于,默认值由DataFrame
where
方法的DataFrame
提供( docs )。
Ie IE
np.where(m, A, B)
is roughly equivalent to大致相当于
A.where(m, B)
If you wanted a similar call signature using pandas, you could take advantage of the way method calls work in Python :如果您想要使用 Pandas 的类似调用签名,您可以利用Python 中方法调用的工作方式:
pd.DataFrame.where(cond=(df['A'] < 0) | (df['B'] > 0), self=df['A'] + df['B'], other=df['A'] / df['B'])
or without kwargs (Note: that the positional order of arguments is different from the numpy
where
argument order ):或不使用 kwargs(注意:参数的位置顺序与
numpy
where
参数顺序不同):
pd.DataFrame.where(df['A'] + df['B'], (df['A'] < 0) | (df['B'] > 0), df['A'] / df['B'])
I prefer using pandas' mask
over where
since it is less counterintuitive (at least for me).我更喜欢使用pandas 的
mask
而不是where
因为它不那么违反直觉(至少对我而言)。
(df['A']/df['B']).mask(df['A']<0) | (df['B']>0), df['A']+df['B'])
Here, column A
and B
are added where the condition holds, otherwise their ratio stays untouched.此处,在条件成立的地方添加
A
列和B
列,否则它们的比率保持不变。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.