Replacing multiple values on multiple conditions in DataFrame

Question

I have the following code which produces a df with 7 columns and 40000 rows:

df = pd.DataFrame(np.random.random(size=(40000, 7)), columns=list('ABCDEFGH'))

How do I replace every value less than 1/3 to "a", every value between 1/3 and 2/3 to be "b" and any above 2/3 and below 1 to be "c"? I have tried using pd.cut() but it seems to only work for only one column. I have also tried:

df[df <= 1/3] = "a"
df[(df > 1/3) & (df < 2/3)] = "b"
df[df > 2/3] = "c"

Answer 1

you might be facing error in second step of comparing the integer with string that got replaced in the first step. Try this

    t1=df <= 1/3
    t2=(df > 1/3) & (df < 2/3)
    t3=df > 2/3
    df[t1]="a"
    df[t2]="b"
    df[t3]="c"

we first made comparisons and saved filter and then made changes

Answer 2

Use applymap

Apply map documentation

def remap(x):
    if x <= 1/3:
        return 'a'
    elif x > 1/3 and x < 2/3:
        return 'b'
    else:
        return 'c'

df.applymap(remap)

Anytime you want to 'replace items in an array with another one' you usually want to use map

Answer 3

You can use np.select , you can add as many conditions and choices. df.lt is less than, df.gt is greater than, df.le is less than equal to, df.ge is greater than equal to.

np.random.seed(0) # for reproducing same results
df = pd.DataFrame(np.random.random(size=(40000, 7)), columns=list('ABCDEFG'))
df.head()

          A         B         C         D         E         F         G
0  0.548814  0.715189  0.602763  0.544883  0.423655  0.645894  0.437587
1  0.891773  0.963663  0.383442  0.791725  0.528895  0.568045  0.925597
2  0.071036  0.087129  0.020218  0.832620  0.778157  0.870012  0.978618
3  0.799159  0.461479  0.780529  0.118274  0.639921  0.143353  0.944669
4  0.521848  0.414662  0.264556  0.774234  0.456150  0.568434  0.018790

condlist = [df.lt(1/3), (df.gt(1/3)&df.lt(2/3)]
choicelist = ['a', 'b']
df = pd.DataFrame(np.select(condlist, choicelist, 'c')
df.head()
    A   B   C   D   E   F   G
0   b   c   b   b   b   b   b
1   c   c   b   c   b   b   c
2   a   a   a   c   c   c   c
3   c   b   c   a   b   a   c
4   b   b   a   c   b   b   a

Or use df.apply with pd.cut

# Using the same df as above.
df.apply(pd.cut,
         bins=[0, 1/3, 2/3, 1], 
         labels=['a', 'b', 'c']
        )

   A  B  C  D  E  F  G
0  b  c  b  b  b  b  b
1  c  c  b  c  b  b  c
2  a  a  a  c  c  c  c
3  c  b  c  a  b  a  c
4  b  b  a  c  b  b  a

Replacing multiple values on multiple conditions in DataFrame

Question

3 answers

solution1
2 2020-10-06 17:56:30

solution2
2 2020-10-06 17:58:16

Use applymap

solution3
1 2020-10-06 18:26:13

Replacing multiple values on multiple conditions in DataFrame

Question

3 answers

solution1 2 2020-10-06 17:56:30

solution2 2 2020-10-06 17:58:16

Use applymap

solution3 1 2020-10-06 18:26:13

solution1
2 2020-10-06 17:56:30

solution2
2 2020-10-06 17:58:16

solution3
1 2020-10-06 18:26:13