简体   繁体   中英

Replacing multiple values on multiple conditions in DataFrame

I have the following code which produces a df with 7 columns and 40000 rows:

df = pd.DataFrame(np.random.random(size=(40000, 7)), columns=list('ABCDEFGH'))

How do I replace every value less than 1/3 to "a", every value between 1/3 and 2/3 to be "b" and any above 2/3 and below 1 to be "c"? I have tried using pd.cut() but it seems to only work for only one column. I have also tried:

df[df <= 1/3] = "a"
df[(df > 1/3) & (df < 2/3)] = "b"
df[df > 2/3] = "c"

you might be facing error in second step of comparing the integer with string that got replaced in the first step. Try this

    t1=df <= 1/3
    t2=(df > 1/3) & (df < 2/3)
    t3=df > 2/3
    df[t1]="a"
    df[t2]="b"
    df[t3]="c"

we first made comparisons and saved filter and then made changes

Use applymap

Apply map documentation

def remap(x):
    if x <= 1/3:
        return 'a'
    elif x > 1/3 and x < 2/3:
        return 'b'
    else:
        return 'c'

df.applymap(remap)

Anytime you want to 'replace items in an array with another one' you usually want to use map

You can use np.select , you can add as many conditions and choices. df.lt is less than, df.gt is greater than, df.le is less than equal to, df.ge is greater than equal to.

np.random.seed(0) # for reproducing same results
df = pd.DataFrame(np.random.random(size=(40000, 7)), columns=list('ABCDEFG'))
df.head()

          A         B         C         D         E         F         G
0  0.548814  0.715189  0.602763  0.544883  0.423655  0.645894  0.437587
1  0.891773  0.963663  0.383442  0.791725  0.528895  0.568045  0.925597
2  0.071036  0.087129  0.020218  0.832620  0.778157  0.870012  0.978618
3  0.799159  0.461479  0.780529  0.118274  0.639921  0.143353  0.944669
4  0.521848  0.414662  0.264556  0.774234  0.456150  0.568434  0.018790

condlist = [df.lt(1/3), (df.gt(1/3)&df.lt(2/3)]
choicelist = ['a', 'b']
df = pd.DataFrame(np.select(condlist, choicelist, 'c')
df.head()
    A   B   C   D   E   F   G
0   b   c   b   b   b   b   b
1   c   c   b   c   b   b   c
2   a   a   a   c   c   c   c
3   c   b   c   a   b   a   c
4   b   b   a   c   b   b   a

Or use df.apply with pd.cut

# Using the same df as above.
df.apply(pd.cut,
         bins=[0, 1/3, 2/3, 1], 
         labels=['a', 'b', 'c']
        )

   A  B  C  D  E  F  G
0  b  c  b  b  b  b  b
1  c  c  b  c  b  b  c
2  a  a  a  c  c  c  c
3  c  b  c  a  b  a  c
4  b  b  a  c  b  b  a

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM