[英]Nested np.where
I have the following dataframe:我有以下数据框:
S A
1 1
1 0
2 1
2 0
I wanted to create a new 'Result'
column that is calculated based on the values of both column A and column S.我想创建一个新的
'Result'
列,该列是根据 A 列和 S 列的值计算的。
I wrote the following nested np.where
code我写了以下嵌套的
np.where
代码
df['Result'] = np.where((df.S == 1 & df.A == 1), 1,
(df.S == 1 & df.A == 0), 0,
(df.S == 2 & df.A == 1), 0,
(df.S == 2 & df.A == 0), 1))))
but when I execute it, I get the following error:但是当我执行它时,我收到以下错误:
SyntaxError: invalid syntax
What is wrong with my code?我的代码有什么问题?
As far as I know np.where
does not support multiple return statements (at least not more than two).据我所知
np.where
不支持多个 return 语句(至少不超过两个)。 So either you rewrite your np.where
to result in one True and one False statement and to return 1/0 for True/False, or you need to use masks.因此,要么您重写
np.where
以生成一个 True 和一个 False 语句,并为 True/False 返回 1/0,要么您需要使用掩码。
If you rewrite np.where
, you are limited to two results and the second result will always be set when the condition is not True.如果你重写
np.where
,你只能得到两个结果,当条件不为 True 时,第二个结果将始终被设置。 So it will be also set for values like (S == 5) & (A = np.nan)
.因此,它也将设置为
(S == 5) & (A = np.nan)
。
df['Result'] = np.where(((df.S == 1) & (df.A == 1)) | ((df.S == 2) & (df.A == 0)), 1, 0)
When using masks, you can apply an arbitrary number of conditions and results.使用掩码时,您可以应用任意数量的条件和结果。 For your example, the solution looks like:
对于您的示例,解决方案如下所示:
mask_0 = ((df.S == 1) & (df.A == 0)) | ((df.S == 2) & (df.A == 1))
mask_1 = ((df.S == 1) & (df.A == 1)) | ((df.S == 2) & (df.A == 0))
df.loc[mask_0, 'Result'] = 0
df.loc[mask_1, 'Result'] = 1
Results will be set to np.nan
where no condition is met.结果将设置为
np.nan
任何条件。 This is imho failsafe and should thus be used.这是 imho 故障安全,因此应该使用。 But if you want to have zeros in these locations, just initialize your
Results
column with zeros.但是如果你想在这些位置有零,只需用零初始化你的
Results
列。
Of course this can be simplified for special cases like only having 1 and 0 as result and extended for any number of result by using dicts or other containers.当然,这可以在特殊情况下简化,例如只有 1 和 0 作为结果,并通过使用 dicts 或其他容器扩展为任意数量的结果。
You should use nested np.where.你应该使用嵌套的 np.where。 It is like sql case clause.
它就像 sql case 子句。 But be careful when there is nan in the data.
但是当数据中有 nan 时要小心。
df=pd.DataFrame({'S':[1,1,2,2],'A':[1,0,1,0]})
df['Result'] = np.where((df.S == 1) & (df.A == 1), 1, #when... then
np.where((df.S == 1) & (df.A == 0), 0, #when... then
np.where((df.S == 2) & (df.A == 1), 0, #when... then
1))) #else
df
| | S | A | Result |
|---|---|---|--------|
| 0 | 1 | 1 | 1 |
| 1 | 1 | 0 | 0 |
| 2 | 2 | 1 | 0 |
| 3 | 2 | 0 | 1 |
I would recommend using numpy.select if you have very nested operations.如果您有非常嵌套的操作,我建议使用numpy.select 。
df = pd.DataFrame({
"S": [1, 1, 2, 2],
"A": [1, 0, 1, 0]
})
# you could of course combine the clause (1, 4) and (2, 3) with the '|' or operator
df['RESULT'] = np.select([
(df.S == 1) & (df.A == 1),
(df.S == 1) & (df.A == 0),
(df.S == 2) & (df.A == 1),
(df.S == 2) & (df.A == 0)
], [1, 0, 0, 1])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.