简体   繁体   中英

How to compare each cell in one column with a specific value in pandas?

I have a dataframe like this, I want to achieve this:

if the sign of A is the same with the sign of B, get a new column C = min(3, |A|); if the sign of A is is different from the sign of B, C = min(3, B); if the values for A & B are zero, C=A

Type subType  A       B     C     
 X    a       -1      4     3   
 X    a       5       9     3
 X    a       5       9     3
 X    b       1       4     1   
 X    b       3       5    ...
 X    b       5       0
 Y    a       -1      1         
 Y    a       3       2  
 Y    a       -5      3
 Y    b       1       4        
 Y    b       3       5 
 Y    b       5      -2

I tried :

if df["A"] * df["B"] > 0:
    df["C"] = (3, abs(df["A"]).min(axis=1)

It gave me error, seems like I can't compare a value '3' with a column directly, any suggestions?

Follow-up: what if the formula is more complex like C = A + min(3, |A|) *B ?

Because if the values for A & B are zero it means use minimum between (3, abs(0)) what is always 0 solution should be simplify with numpy.where and numpy.minimum :

#compare signs
m = np.sign(df["A"]) == np.sign(df["B"])
#alternative
#m = (df["A"] * df["B"]) >= 0
df['C'] = np.where(m, np.minimum(3, df.A.abs()), np.minimum(3, df.B))
print (df)
   Type subType  A  B  C
0     X       a -1  4  3
1     X       a  5  9  3
2     X       a  5  9  3
3     X       b  1  4  1
4     X       b  3  5  3
5     X       b  5  0  0
6     Y       a -1  1  1
7     Y       a  3  2  3
8     Y       a -5  3  3
9     Y       b  1  4  1
10    Y       b  3  5  3
11    Y       b  5 -2 -2

EDIT: If need more condition in pandas/numpy is possible use instead multiple np.where function numpy.select :

m1 = np.sign(df.A) == np.sign(df.B)
m2 = np.sign(df.A) == np.sign(df.C)

s1 = df.A + np.minimum(3, df.A.abs()) * df.B
s2 = df.C + np.minimum(3, df.A.abs()) * df.B

df['D'] = np.select([m1, m2], [s1, s2], default=df.A)
print (df)
   Type subType  A  B  C   D
0     X       a -1  4  3  -1
1     X       a  5  9  3  32
2     X       a  5  9  3  32
3     X       b  1  4  1   5
4     X       b  3  5  3  18
5     X       b  5  0  0   5
6     Y       a -1  1  1  -1
7     Y       a  3  2  3   9
8     Y       a -5  3  3  -5
9     Y       b  1  4  1   5
10    Y       b  3  5  3  18
11    Y       b  5 -2 -2   5
df['C'] = [min(abs(a), 3) if a*b > 0 else min(b, 3) if a*b < 0 else a for a,b in zip(df.A, df.B)]
df['C'] = np.where(df.A.mul(df.B).gt(0), df.A.abs().clip(upper=3), 
                   np.where(df.A.mul(df.B).lt(0), df.B.clip(upper=3), df.A)
                  )


    Type    subType A   B   C
0   X       a       -1  4   3
1   X       a       5   9   3
2   X       a       5   9   3
3   X       b       1   4   1
4   X       b       3   5   3
5   X       b       5   0   5
6   Y       a      -1   1   1
7   Y       a       3   2   3
8   Y       a      -5   3   3
9   Y       b       1   4   1
10  Y       b       3   5   3
11  Y       b       5  -2  -2

You can try this as you would:

df["C"] = np.where(df["A"]*df["B"]>0, min(3,abs(df["A"]).min()),
                   np.where(df["A"]*df["B"]<0, min(3,df["B"].min()),
                            df["A"]))

df
   Type subType  A  B  C
0     X       a -1  4 -2
1     X       a  5  9  1
2     X       a  5  9  1
3     X       b  1  4  1
4     X       b  3  5  1
5     X       b  5  0  5
6     Y       a -1  1 -2
7     Y       a  3  2  1
8     Y       a -5  3 -2
9     Y       b  1  4  1
10    Y       b  3  5  1
11    Y       b  5 -2 -2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM