简体   繁体   English

如何从缩放数据重建原始数据?

[英]How to reconstruct raw data from scaled data?

I have some data on which I applied scaling with scikit-learn. 我有一些数据使用scikit-learn进行缩放。 Once scaled I would like to recover the original data. 缩放后,我想恢复原始数据。 Is this possible? 这可能吗? If not, how can I get correspondance from the original data. 如果没有,我如何从原始数据中获取对应关系。

Here a toy example 这是一个玩具的例子

from sklearn.datasets import load_iris
from sklearn.preprocessing import scale
iris = load_iris()
X = iris.data
X_scale = scale(X)
print X[:4]
print X_scale[:4]

producing 生产

[[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]]
[[-0.90068117  1.03205722 -1.3412724  -1.31297673]
 [-1.14301691 -0.1249576  -1.3412724  -1.31297673]
 [-1.38535265  0.33784833 -1.39813811 -1.31297673]
 [-1.50652052  0.10644536 -1.2844067  -1.31297673]]

How from the second data can I recover the original data? 如何从第二个数据恢复原始数据?

One of the most common types of feature scaling methods scales the data by setting the mean value of a data set to zero, and the standard deviation to one. 特征缩放方法是最常见的一种类型,它通过将数据集的平均值设置为零并将标准偏差设置为1来缩放数据。 This is extremely useful for many learning algorithms. 这对于许多学习算法来说非常有用。 This is achieved simply using the following: 只需使用以下命令即可实现:

scaled_array = (original_array - mean_of_array)/std_of_array

In Sklearn, each array column appears to be scaled in this way. 在Sklearn中,每个数组列似乎都以这种方式缩放。 To find the original data, simply rearrange the above, or alternatively just calculate the standard deviation and mean of each column in the unscaled data. 要查找原始数据,只需重新排列以上内容,或者仅计算未缩放数据中每一列的标准差和均值。 You can then use this to transform the scaled data back to the original data at any time. 然后,您可以随时使用它将缩放后的数据转换回原始数据。

For more information on how Sklearn's scaling works, the docs are here . 有关Sklearn的比例是如何工作的详细信息,该文档是在这里 To understand more about feature scaling generally, the wiki page is a good place to start. 要大致了解有关功能缩放的更多信息, Wiki页面是一个不错的起点。

MarkyD43 has provided a great answer to this question. MarkyD43为这个问题提供了一个很好的答案。 Here is the code version of transforming the data back to the original version 这是将数据转换回原始版本的代码版本

from sklearn.datasets import load_iris
from sklearn.preprocessing import scale
iris = load_iris()
X = iris.data

mean_of_array = X.mean(axis=0)
std_of_array = X.std(axis=0)

X_scale = scale(X)

X_original = (X_scale * std_of_array) + mean_of_array

print X[:4]
print X_original[:4]

producing 生产

[[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]]
[[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从直方图重建原始数据? - How to reconstruct the raw data from a histogram? 给定索引矩阵和原始数据,如何快速重建新数据帧 - Given index matrix and raw data , how to reconstruct new dataframe quickly 如何在线重构虚拟数据? - How to reconstruct a dummy data in lines? 如何将从缩放数据中学到的决策边界转移到原始数据(缩放后的数据)? - How to transfer decision boundary learned from scaled data to the original data (scaled back data)? 从Python中的字典键重建数据 - Reconstruct data from dictionary keys in Python 如何从 Abaqus inp 文件中存在的节点坐标及其连接数据重建 Python/Matlab 中的网格? - How can I reconstruct a mesh in Python/Matlab from the nodes coordinates and their connectivity data existed in Abaqus inp file? 如何在 PCA 之后使用 2 个矩阵重建 Python 中 10 个分量的数据? - How to reconstruct data from 10 components in Python using 2 matrixes after PCA? Python:从数学 model 中查找常数以重建测量数据 - Python: Find constants from mathematical model to reconstruct measured data 如何从 DHPublicKey 获取原始密钥数据? - How to get raw key data from DHPublicKey? 如何使用密码从 Pastebin 获取原始数据? - How to get a raw data from Pastebin with password?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM