[英]Pandas dataframe scale column based on another column
I've got a Dataframe that looks like this:我有一个看起来像这样的数据框:
cat val
0 1 10
1 1 4
2 2 6
3 2 2
4 1 8
5 2 12
Where cat
is category, and val
is value.其中
cat
是类别, val
是值。 I would like to create a column, called scaled
, that is linearly scaled/normalized to 0-1, on a per-category basis.我想创建一个名为
scaled
的列,按类别线性缩放/标准化为 0-1。 I know how to do the former - ((val - min) / (max - min))
- at the column level, and I also know how to perform operations on a per-category basis, I just don't know how to combine the two.我知道如何在列级别执行前者 -
((val - min) / (max - min))
- 我也知道如何在每个类别的基础上执行操作,我只是不知道如何将两者结合起来。 The desired result is:期望的结果是:
cat val scaled
0 1 10 1
1 1 4 0
2 2 6 0.4
3 2 2 0
4 1 8 0.667
5 2 12 1
Ideally I'd like to stick to using Pandas only.理想情况下,我只想坚持使用 Pandas。
Any help would be appreciated, thank you!任何帮助将不胜感激,谢谢!
Your scaling is to subtract the min and divide by the range, so use groupby
+ transform
to broadcast those properties back to every row for that group and do the math.您的缩放是减去最小值并除以范围,因此使用
groupby
+ transform
将这些属性广播回该组的每一行并进行数学运算。
import numpy as np
gp = df.groupby('cat')['val']
df['scaled'] = (df['val'] - gp.transform(min))/gp.transform(np.ptp)
cat val scaled
0 1 10 1.000000
1 1 4 0.000000
2 2 6 0.400000
3 2 2 0.000000
4 1 8 0.666667
5 2 12 1.000000
For aggregations that reduce to a scalar, groupby
+ agg/apply
reduces to a single row per group;对于减少为标量的聚合,
groupby
+ agg/apply
减少为每组一行; however groupby
+ transform
returns a like-Indexed Series so that it aligns to the original DataFrame.但是
groupby
+ transform
返回一个类似索引的系列,以便它与原始 DataFrame 对齐。
gp.min()
#cat
#1 4
#2 2
#Name: val, dtype: int64
gp.transform(min)
#0 4
#1 4
#2 2
#3 2
#4 4
#5 2
#Name: val, dtype: int64
You can use the following lines of code to do the scaling based on another column您可以使用以下代码行基于另一列进行缩放
import pandas as pd
df = pd.DataFrame({'Group': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3], 'Values': [1, 4, -2, 7, 3, 4, 1, -5, 12, 4, 10, 2, 6, 20, 15]})
# Normalize around mean
df['mean_normal'] = df.groupby('Group').transform(lambda x: (x - x.mean()/ x.std()))
# Normalize between 0 and 1
df['min_max_normal'] = df.groupby('Group').transform(lambda x: ((x - x.min())/ (x.max() - x.min())))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.