I've got a Dataframe that looks like this:
cat val
0 1 10
1 1 4
2 2 6
3 2 2
4 1 8
5 2 12
Where cat
is category, and val
is value. I would like to create a column, called scaled
, that is linearly scaled/normalized to 0-1, on a per-category basis. I know how to do the former - ((val - min) / (max - min))
- at the column level, and I also know how to perform operations on a per-category basis, I just don't know how to combine the two. The desired result is:
cat val scaled
0 1 10 1
1 1 4 0
2 2 6 0.4
3 2 2 0
4 1 8 0.667
5 2 12 1
Ideally I'd like to stick to using Pandas only.
Any help would be appreciated, thank you!
Your scaling is to subtract the min and divide by the range, so use groupby
+ transform
to broadcast those properties back to every row for that group and do the math.
import numpy as np
gp = df.groupby('cat')['val']
df['scaled'] = (df['val'] - gp.transform(min))/gp.transform(np.ptp)
cat val scaled
0 1 10 1.000000
1 1 4 0.000000
2 2 6 0.400000
3 2 2 0.000000
4 1 8 0.666667
5 2 12 1.000000
For aggregations that reduce to a scalar, groupby
+ agg/apply
reduces to a single row per group; however groupby
+ transform
returns a like-Indexed Series so that it aligns to the original DataFrame.
gp.min()
#cat
#1 4
#2 2
#Name: val, dtype: int64
gp.transform(min)
#0 4
#1 4
#2 2
#3 2
#4 4
#5 2
#Name: val, dtype: int64
You can use the following lines of code to do the scaling based on another column
import pandas as pd
df = pd.DataFrame({'Group': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3], 'Values': [1, 4, -2, 7, 3, 4, 1, -5, 12, 4, 10, 2, 6, 20, 15]})
# Normalize around mean
df['mean_normal'] = df.groupby('Group').transform(lambda x: (x - x.mean()/ x.std()))
# Normalize between 0 and 1
df['min_max_normal'] = df.groupby('Group').transform(lambda x: ((x - x.min())/ (x.max() - x.min())))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.