I have following dataframe:
Index ColA ColB ColC ColD
0 1 4 13 ABC
1 12 1 24 ABC
2 36 18 1 ABC
3 41 45 1 ABC
Now I'm searching for a simple command to transform the pandas df in such a way that the values of ColA, ColB, and ColC are resembled as follows:
for each row:
if value in ColA <= 12 then 1
if value in ColA > 12 and <= 24 then 2
if value in ColA > 24 and <= 36 then 3
if value in ColA > 36 then 4
(the same also for the other columns)
So the result would look like this:
Index ColA ColB ColC ColD
0 1 1 2 ABC
1 1 1 2 ABC
2 3 2 1 ABC
3 4 4 1 ABC
Is there a simple way to achieve this? :-)
Best regards, André
You can use the functions provided by pandas to solve this problem.
Basically, you can iterate over all the columns and change all the values of a column that lie in a range to the new value using the functions provided by the pandas dataframe.
import pandas as pd
import numpy as np
df = pd.DataFrame()
df["ColA"] = [1, 12, 32, 24]
df["ColB"] = [23, 11, 6, 45]
df["ColC"] = [10, 25, 3, 23]
print(df)
Output:
ColA ColB ColC
0 1 23 10
1 12 11 25
2 32 6 3
3 24 33 23
Now, we will find all the indexes for a column that have values in the given range using the code df['ColA'].between(0,12)
and assign new value for these indexes for this column, using the code df.loc[df['ColA'].between(0,12), 'ColA'] = 1
.
This is done for ColA, now to do it for all columns of a dataframe we will use looping and this can be done using the following code.
for col in df.columns:
df.loc[df[col].between(0,12), col] = 1
df.loc[df[col].between(13,24), col] = 2
df.loc[df[col].between(25,36), col] = 3
print(df)
Output:
ColA ColB ColC
0 1 2 1
1 1 1 3
2 1 1 1
3 1 3 2
General solution with numpy.select
:
cols = ['ColA','ColB','ColC']
m1 = df[cols] <= 12
m2 = df[cols] <= 24
m3 = df[cols] <= 36
df[cols] = np.select([m1, m2, m3], [1,2,3], default=4)
print (df)
ColA ColB ColC ColD
0 1 1 2 ABC
1 1 1 2 ABC
2 3 2 1 ABC
3 4 4 1 ABC
Another solution if alwyas need [1,2,3,4]
values with your conditions:
Subtract 1 and use integer division of 12
, last add 1
, also added DataFrame.clip
for set minimal and maximal values outside threshold:
cols = ['ColA','ColB','ColC']
df[cols] = (df[cols].clip(lower=1, upper=37) - 1) // 12 + 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.