简体   繁体   English

numpy 数组的矢量化“逐层”缩放

[英]vectorized "by-layer" scaling of numpy array

I have a numpy array (let's say 100x64x64).我有一个 numpy 数组(比如说 100x64x64)。

My goal is to scale each 64x64 layer independently and store a scaler for later use.我的目标是独立缩放每个 64x64 层并存储缩放器以供以后使用。

This is how it can be achieved with a for-loop solution:这是使用 for 循环解决方案实现的方式:

scalers_dict={}    
for i in range(X.shape[0]):
            scalers_dict[i] = MinMaxScaler()
            #fitting the scaler
            X[i, :, :] = scalers_dict[i].fit_transform(X[i, :, :])
#saving dict of scalers
joblib.dump(value=scalers_dict,filename="dict_of_scalers.scaler")

My real array is much bigger, and it takes quite a while to iterate through it.我的真实数组要大得多,遍历它需要很长时间。

Do you have in mind some more vectorized solution for that, or for-loop is the only way?你有没有想到一些更矢量化的解决方案,或者 for-loop 是唯一的方法?

If I understand correctly how MinMaxScaler works, it can operate on independent arrays which reduce along axis=0 .如果我正确理解MinMaxScaler工作原理,它可以在沿axis=0减少的独立数组上运行。

To make this useful for your case, you'd need to transform X into a (64 * 64, 100) array:为了使这对您的情况有用,您需要将X转换为(64 * 64, 100)数组:

s = X.shape
X = np.moveaxis(X, 0, -1).reshape(-1, s[0])

Alternatively, you can write或者,你可以写

X = X.reshape(s[0], -1).T

Now you can do the scaling with现在你可以用

M = MinMaxScaler()
X = M.fit_transform(X)

Since the actual fit is computed on the first dimension, all the results will be of size 100. This will broadcast perfectly now that the last dimension is of the same size.由于实际拟合是在第一个维度上计算的,所有结果的大小都是 100。现在最后一个维度的大小相同,这将完美地广播。

To get the original shape back, invert the original transformation:要恢复原始形状,请反转原始变换:

X = X.T.reshape(s)

When you are done, M will be a scaler calibrated for 100 features.完成后, M将是针对 100 个特征校准的缩放器。 There is no need for a dictionary here.这里不需要字典。 Remember that a dictionary keyed by a sequence of integers can better be expressed as a list or array, which is what happens here.请记住,以整数序列为键的字典可以更好地表示为列表或数组,这就是这里发生的情况。

IIUC, you can manually scale: IIUC,您可以手动缩放:

mm, MM = inputs.min(axis=(1,2)), inputs.max(axis=(1,2))

# save these for later use
joblib.dump((mm,MM), 'minmax.joblib')

def scale(inputs, mm, MM):
    return (inputs - mm[:,None,None])/(MM-mm)[:,None,None]

# load pre-saved min & max
mm, MM = joblib.load('minmax.joblib')

# scaled inputs
scale(inputs, mm, MM)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM