基于另一列的 Pandas 数据框比例列

Question

I've got a Dataframe that looks like this:我有一个看起来像这样的数据框：

    cat   val
0     1    10
1     1     4
2     2     6
3     2     2
4     1     8
5     2    12

Where cat is category, and val is value.其中cat是类别， val是值。 I would like to create a column, called scaled , that is linearly scaled/normalized to 0-1, on a per-category basis.我想创建一个名为scaled的列，按类别线性缩放/标准化为 0-1。 I know how to do the former - ((val - min) / (max - min)) - at the column level, and I also know how to perform operations on a per-category basis, I just don't know how to combine the two.我知道如何在列级别执行前者 - ((val - min) / (max - min)) - 我也知道如何在每个类别的基础上执行操作，我只是不知道如何将两者结合起来。 The desired result is:期望的结果是：

    cat   val  scaled
0     1    10       1  
1     1     4       0
2     2     6     0.4
3     2     2       0
4     1     8   0.667
5     2    12       1

Ideally I'd like to stick to using Pandas only.理想情况下，我只想坚持使用 Pandas。

Any help would be appreciated, thank you!任何帮助将不胜感激，谢谢！

Answer 1

Your scaling is to subtract the min and divide by the range, so use groupby + transform to broadcast those properties back to every row for that group and do the math.您的缩放是减去最小值并除以范围，因此使用groupby + transform将这些属性广播回该组的每一行并进行数学运算。

import numpy as np

gp = df.groupby('cat')['val']

df['scaled'] = (df['val'] - gp.transform(min))/gp.transform(np.ptp)

   cat  val    scaled
0    1   10  1.000000
1    1    4  0.000000
2    2    6  0.400000
3    2    2  0.000000
4    1    8  0.666667
5    2   12  1.000000

For aggregations that reduce to a scalar, groupby + agg/apply reduces to a single row per group;对于减少为标量的聚合， groupby + agg/apply减少为每组一行； however groupby + transform returns a like-Indexed Series so that it aligns to the original DataFrame.但是groupby + transform返回一个类似索引的系列，以便它与原始 DataFrame 对齐。

gp.min()
#cat
#1    4
#2    2
#Name: val, dtype: int64

gp.transform(min)
#0    4
#1    4
#2    2
#3    2
#4    4
#5    2
#Name: val, dtype: int64

Answer 2

You can use the following lines of code to do the scaling based on another column您可以使用以下代码行基于另一列进行缩放

import pandas as pd

df = pd.DataFrame({'Group': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3], 'Values': [1, 4, -2, 7, 3, 4, 1, -5, 12, 4, 10, 2, 6, 20, 15]})

# Normalize around mean
df['mean_normal'] = df.groupby('Group').transform(lambda x: (x - x.mean()/ x.std()))
# Normalize between 0 and 1
df['min_max_normal'] = df.groupby('Group').transform(lambda x: ((x - x.min())/ (x.max() - x.min())))

基于另一列的 Pandas 数据框比例列

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-06-23 14:32:58

解决方案2
0 2022-06-19 02:47:13

基于另一列的 Pandas 数据框比例列

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-06-23 14:32:58

解决方案2 0 2022-06-19 02:47:13

解决方案1
2 已采纳 2021-06-23 14:32:58

解决方案2
0 2022-06-19 02:47:13