[英]How to reconstruct raw data from scaled data?
I have some data on which I applied scaling with scikit-learn. 我有一些数据使用scikit-learn进行缩放。 Once scaled I would like to recover the original data.
缩放后,我想恢复原始数据。 Is this possible?
这可能吗? If not, how can I get correspondance from the original data.
如果没有,我如何从原始数据中获取对应关系。
Here a toy example 这是一个玩具的例子
from sklearn.datasets import load_iris
from sklearn.preprocessing import scale
iris = load_iris()
X = iris.data
X_scale = scale(X)
print X[:4]
print X_scale[:4]
producing 生产
[[ 5.1 3.5 1.4 0.2]
[ 4.9 3. 1.4 0.2]
[ 4.7 3.2 1.3 0.2]
[ 4.6 3.1 1.5 0.2]]
[[-0.90068117 1.03205722 -1.3412724 -1.31297673]
[-1.14301691 -0.1249576 -1.3412724 -1.31297673]
[-1.38535265 0.33784833 -1.39813811 -1.31297673]
[-1.50652052 0.10644536 -1.2844067 -1.31297673]]
How from the second data can I recover the original data? 如何从第二个数据恢复原始数据?
One of the most common types of feature scaling methods scales the data by setting the mean value of a data set to zero, and the standard deviation to one. 特征缩放方法是最常见的一种类型,它通过将数据集的平均值设置为零并将标准偏差设置为1来缩放数据。 This is extremely useful for many learning algorithms.
这对于许多学习算法来说非常有用。 This is achieved simply using the following:
只需使用以下命令即可实现:
scaled_array = (original_array - mean_of_array)/std_of_array
In Sklearn, each array column appears to be scaled in this way. 在Sklearn中,每个数组列似乎都以这种方式缩放。 To find the original data, simply rearrange the above, or alternatively just calculate the standard deviation and mean of each column in the unscaled data.
要查找原始数据,只需重新排列以上内容,或者仅计算未缩放数据中每一列的标准差和均值。 You can then use this to transform the scaled data back to the original data at any time.
然后,您可以随时使用它将缩放后的数据转换回原始数据。
For more information on how Sklearn's scaling works, the docs are here . 有关Sklearn的比例是如何工作的详细信息,该文档是在这里 。 To understand more about feature scaling generally, the wiki page is a good place to start.
要大致了解有关功能缩放的更多信息, Wiki页面是一个不错的起点。
MarkyD43 has provided a great answer to this question. MarkyD43为这个问题提供了一个很好的答案。 Here is the code version of transforming the data back to the original version
这是将数据转换回原始版本的代码版本
from sklearn.datasets import load_iris
from sklearn.preprocessing import scale
iris = load_iris()
X = iris.data
mean_of_array = X.mean(axis=0)
std_of_array = X.std(axis=0)
X_scale = scale(X)
X_original = (X_scale * std_of_array) + mean_of_array
print X[:4]
print X_original[:4]
producing 生产
[[ 5.1 3.5 1.4 0.2]
[ 4.9 3. 1.4 0.2]
[ 4.7 3.2 1.3 0.2]
[ 4.6 3.1 1.5 0.2]]
[[ 5.1 3.5 1.4 0.2]
[ 4.9 3. 1.4 0.2]
[ 4.7 3.2 1.3 0.2]
[ 4.6 3.1 1.5 0.2]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.