简体   繁体   中英

Replace a range of integer values in multiple columns of Pandas

I have following dataframe:

Index ColA ColB ColC ColD 
0       1    4   13   ABC
1       12   1   24   ABC
2       36   18  1    ABC
3       41   45  1    ABC

Now I'm searching for a simple command to transform the pandas df in such a way that the values of ColA, ColB, and ColC are resembled as follows:

for each row:
   if value in ColA <= 12 then 1
   if value in ColA > 12 and <= 24 then 2
   if value in ColA > 24 and <= 36 then 3
   if value in ColA > 36 then 4

(the same also for the other columns)

So the result would look like this:

Index ColA ColB ColC ColD 
0       1    1   2    ABC
1       1    1   2    ABC
2       3    2   1    ABC
3       4    4   1    ABC

Is there a simple way to achieve this? :-)

Best regards, André

You can use the functions provided by pandas to solve this problem.

Basically, you can iterate over all the columns and change all the values of a column that lie in a range to the new value using the functions provided by the pandas dataframe.

import pandas as pd
import numpy as np

df = pd.DataFrame()

df["ColA"] = [1, 12, 32, 24]
df["ColB"] = [23, 11, 6, 45]
df["ColC"] = [10, 25, 3, 23]

print(df)

Output:

   ColA  ColB  ColC
0     1    23    10
1    12    11    25
2    32     6     3
3    24    33    23

Now, we will find all the indexes for a column that have values in the given range using the code df['ColA'].between(0,12) and assign new value for these indexes for this column, using the code df.loc[df['ColA'].between(0,12), 'ColA'] = 1 .

This is done for ColA, now to do it for all columns of a dataframe we will use looping and this can be done using the following code.

for col in df.columns:
    df.loc[df[col].between(0,12), col] = 1
    df.loc[df[col].between(13,24), col] = 2
    df.loc[df[col].between(25,36), col] = 3

print(df)

Output:

   ColA  ColB  ColC
0     1     2     1
1     1     1     3
2     1     1     1
3     1     3     2

General solution with numpy.select :

cols = ['ColA','ColB','ColC']
m1 = df[cols] <= 12
m2 = df[cols] <= 24
m3 = df[cols] <= 36

df[cols] = np.select([m1, m2, m3], [1,2,3], default=4)
print (df)
   ColA  ColB  ColC ColD
0     1     1     2  ABC
1     1     1     2  ABC
2     3     2     1  ABC
3     4     4     1  ABC

Another solution if alwyas need [1,2,3,4] values with your conditions:

Subtract 1 and use integer division of 12 , last add 1 , also added DataFrame.clip for set minimal and maximal values outside threshold:

cols = ['ColA','ColB','ColC']

df[cols] = (df[cols].clip(lower=1, upper=37) - 1) // 12 + 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM