简体   繁体   English

嵌套 np.where

[英]Nested np.where

I have the following dataframe:我有以下数据框:

S A
1 1
1 0
2 1
2 0

I wanted to create a new 'Result' column that is calculated based on the values of both column A and column S.我想创建一个新的'Result'列,该列是根据 A 列和 S 列的值计算的。

I wrote the following nested np.where code我写了以下嵌套的np.where代码

df['Result'] = np.where((df.S == 1 & df.A == 1), 1,
                        (df.S == 1 & df.A == 0), 0,
                        (df.S == 2 & df.A == 1), 0,
                        (df.S == 2 & df.A == 0), 1))))

but when I execute it, I get the following error:但是当我执行它时,我收到以下错误:

SyntaxError: invalid syntax

What is wrong with my code?我的代码有什么问题?

As far as I know np.where does not support multiple return statements (at least not more than two).据我所知np.where不支持多个 return 语句(至少不超过两个)。 So either you rewrite your np.where to result in one True and one False statement and to return 1/0 for True/False, or you need to use masks.因此,要么您重写np.where以生成一个 True 和一个 False 语句,并为 True/False 返回 1/0,要么您需要使用掩码。

If you rewrite np.where , you are limited to two results and the second result will always be set when the condition is not True.如果你重写np.where ,你只能得到两个结果,当条件不为 True 时,第二个结果将始终被设置。 So it will be also set for values like (S == 5) & (A = np.nan) .因此,它也将设置为(S == 5) & (A = np.nan)

df['Result'] = np.where(((df.S == 1) & (df.A == 1)) | ((df.S == 2) & (df.A == 0)), 1, 0)

When using masks, you can apply an arbitrary number of conditions and results.使用掩码时,您可以应用任意数量的条件和结果。 For your example, the solution looks like:对于您的示例,解决方案如下所示:

mask_0 = ((df.S == 1) & (df.A == 0)) | ((df.S == 2) & (df.A == 1))
mask_1 = ((df.S == 1) & (df.A == 1)) | ((df.S == 2) & (df.A == 0))
df.loc[mask_0, 'Result'] = 0
df.loc[mask_1, 'Result'] = 1

Results will be set to np.nan where no condition is met.结果将设置为np.nan任何条件。 This is imho failsafe and should thus be used.这是 imho 故障安全,因此应该使用。 But if you want to have zeros in these locations, just initialize your Results column with zeros.但是如果你想在这些位置有零,只需用零初始化你的Results列。
Of course this can be simplified for special cases like only having 1 and 0 as result and extended for any number of result by using dicts or other containers.当然,这可以在特殊情况下简化,例如只有 1 和 0 作为结果,并通过使用 dicts 或其他容器扩展为任意数量的结果。

You should use nested np.where.你应该使用嵌套的 np.where。 It is like sql case clause.它就像 sql case 子句。 But be careful when there is nan in the data.但是当数据中有 nan 时要小心。

df=pd.DataFrame({'S':[1,1,2,2],'A':[1,0,1,0]})
df['Result'] = np.where((df.S == 1) & (df.A == 1), 1,   #when... then
                 np.where((df.S == 1) & (df.A == 0), 0,  #when... then
                  np.where((df.S == 2) & (df.A == 1), 0,  #when... then
                    1)))                                  #else
df

output:输出:

|   | S | A | Result |
|---|---|---|--------|
| 0 | 1 | 1 | 1      |
| 1 | 1 | 0 | 0      |
| 2 | 2 | 1 | 0      |
| 3 | 2 | 0 | 1      |

I would recommend using numpy.select if you have very nested operations.如果您有非常嵌套的操作,我建议使用numpy.select

df = pd.DataFrame({
    "S": [1, 1, 2, 2],
    "A": [1, 0, 1, 0]
})

# you could of course combine the clause (1, 4) and (2, 3) with the '|' or operator
df['RESULT'] = np.select([
    (df.S == 1) & (df.A == 1),
    (df.S == 1) & (df.A == 0),
    (df.S == 2) & (df.A == 1),
    (df.S == 2) & (df.A == 0)
], [1, 0, 0, 1])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM