简体   繁体   English

基于另一列的 Pandas 数据框比例列

[英]Pandas dataframe scale column based on another column

I've got a Dataframe that looks like this:我有一个看起来像这样的数据框:

    cat   val
0     1    10
1     1     4
2     2     6
3     2     2
4     1     8
5     2    12

Where cat is category, and val is value.其中cat是类别, val是值。 I would like to create a column, called scaled , that is linearly scaled/normalized to 0-1, on a per-category basis.我想创建一个名为scaled的列,按类别线性缩放/标准化为 0-1。 I know how to do the former - ((val - min) / (max - min)) - at the column level, and I also know how to perform operations on a per-category basis, I just don't know how to combine the two.我知道如何在列级别执行前者 - ((val - min) / (max - min)) - 我也知道如何在每个类别的基础上执行操作,我只是不知道如何将两者结合起来。 The desired result is:期望的结果是:

    cat   val  scaled
0     1    10       1  
1     1     4       0
2     2     6     0.4
3     2     2       0
4     1     8   0.667
5     2    12       1

Ideally I'd like to stick to using Pandas only.理想情况下,我只想坚持使用 Pandas。

Any help would be appreciated, thank you!任何帮助将不胜感激,谢谢!

Your scaling is to subtract the min and divide by the range, so use groupby + transform to broadcast those properties back to every row for that group and do the math.您的缩放是减去最小值并除以范围,因此使用groupby + transform将这些属性广播回该组的每一行并进行数学运算。

import numpy as np

gp = df.groupby('cat')['val']

df['scaled'] = (df['val'] - gp.transform(min))/gp.transform(np.ptp)

   cat  val    scaled
0    1   10  1.000000
1    1    4  0.000000
2    2    6  0.400000
3    2    2  0.000000
4    1    8  0.666667
5    2   12  1.000000

For aggregations that reduce to a scalar, groupby + agg/apply reduces to a single row per group;对于减少为标量的聚合, groupby + agg/apply减少为每组一行; however groupby + transform returns a like-Indexed Series so that it aligns to the original DataFrame.但是groupby + transform返回一个类似索引的系列,以便它与原始 DataFrame 对齐。

gp.min()
#cat
#1    4
#2    2
#Name: val, dtype: int64

gp.transform(min)
#0    4
#1    4
#2    2
#3    2
#4    4
#5    2
#Name: val, dtype: int64

You can use the following lines of code to do the scaling based on another column您可以使用以下代码行基于另一列进行缩放

import pandas as pd

df = pd.DataFrame({'Group': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3], 'Values': [1, 4, -2, 7, 3, 4, 1, -5, 12, 4, 10, 2, 6, 20, 15]})

# Normalize around mean
df['mean_normal'] = df.groupby('Group').transform(lambda x: (x - x.mean()/ x.std()))
# Normalize between 0 and 1
df['min_max_normal'] = df.groupby('Group').transform(lambda x: ((x - x.min())/ (x.max() - x.min())))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM