简体   繁体   English

是否有支持数据样本的 Min-Max 和 Sum 缩放的 python package?

[英]Is there a python package that supports Min-Max and Sum scaling of a data sample?

I've been looking for a python package/command that is able to scale a given data sample with a predefined min, max and total sum for the to-be scaled sample.我一直在寻找一个 python 包/命令,它能够使用预定义的最小值、最大值和要缩放的样本的总和来缩放给定的数据样本。 I've attempted to use the MinMaxScaler() function of the sklearn.preprocessing package as in the example underneath.我尝试使用 sklearn.preprocessing package 的 MinMaxScaler() function,如下例所示。

Given a base sample:给定一个基本样本:

from sklearn.preprocessing import MinMaxScaler
import pandas as pd

base_sample = pd.DataFrame([5 , 20 , 30 , 35, 45 , 60])

sample_min = 10
sample_max = 50

scaler = MinMaxScaler(feature_range = (sample_min , sample_max))
        
scaled_sample = scaler.fit_transform(base_sample)

print(scaled_sample)

Producing:生产:

[[10.        ]
 [20.90909091]
 [28.18181818]
 [31.81818182]
 [39.09090909]
 [50.        ]]

With Sum:总和:

print(scaled_sample.sum())
180.0

Yet what I need is a command that is able to do the above yet with a predefined different sum, for example based on the sum of the original sample:然而,我需要的是一个能够以预定义的不同总和执行上述操作的命令,例如基于原始样本的总和:

print(base_sample.sum())
195

or any other predefined sum.或任何其他预定义的总和。 In essence the values inbetween the min and max have to be scaled accordingly to match the sum without violating the min and max constraints.本质上,最小值和最大值之间的值必须相应地缩放以匹配总和,而不会违反最小值和最大值约束。 I've been doing this kind of exercise for a long time within a commercial tool that unfortunately does not allow me to have a look under the hood for the underlying formulation.很长时间以来,我一直在一个商业工具中进行这种练习,不幸的是,它不允许我深入了解底层公式。 Any suggestions on how to proceed would be very welcome.任何关于如何进行的建议都将非常受欢迎。

Maybe this works,也许这行得通,

scaled_sample/scaler.scale_

The scaler.scale_ is equivalent to (max - min) / (X.max(axis=0) - X.min(axis=0)) scaler.scale_等价于(max - min) / (X.max(axis=0) - X.min(axis=0))

With a linear transformation, this is not possible.使用线性变换,这是不可能的。 You cannot transform the values of a vector to an arbitrary minimum, maximum and sum.您不能将向量的值转换为任意最小值、最大值和总和。 You can achieve this with non-linear transforms, given that you can scale certain weights more up/down than others to control the extrema while adjusting the sum (or vice versa).您可以通过非线性变换来实现这一点,因为您可以比其他权重更多地向上/向下缩放以控制极值,同时调整总和(反之亦然)。 This would become an optimization problem, which has an infinite amount of answers as you can literally do anything with the transformed vector.这将成为一个优化问题,它有无限量的答案,因为你可以用转换后的向量做任何事情。 You can limit this by setting the transformation function.您可以通过设置转换 function 来限制这一点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM