[英]preProc = c(“center”, “scale”) meaning in caret's package (R) and min-max normalization
I am wondering how preProc
can be used within the train()
function of caret
.我想知道如何在 caret 的
train()
preProc
中使用caret
。 I am running a neural network in the train()
function using neuralnet
.我正在使用神经网络在
train()
function 中运行neuralnet
网络。 The code comes from this question .代码来自这个问题。
This is actually the code:这实际上是代码:
nn <- train(medv ~ .,
data = df,
method = "neuralnet",
tuneGrid = grid,
metric = "RMSE",
preProc = c("center", "scale", "nzv"), #good idea to do this with neural nets - your error is due to non scaled data
trControl = trainControl(
method = "cv",
number = 5,
verboseIter = TRUE)
)
The original data is not scaled, so that it is recommended to scale the data before running the neural network.原始数据没有缩放,因此建议在运行神经网络之前对数据进行缩放。
However, in the argument preProc
appears three elements: center
, scale
, nzv
.然而,在参数
preProc
中出现了三个元素: center
、 scale
、 nzv
。 I am having problems interpreting those values, as I do not know why they are present.我在解释这些值时遇到问题,因为我不知道它们为什么存在。 Furthermore, I would like to scale/normalize my data using min-max.
此外,我想使用 min-max 缩放/标准化我的数据。 This would be the function:
这将是 function:
maxs = apply(pk_dc_only$C, 2, max)
mins = apply(pk_dc_only$C, 2, min)
scaled = as.data.frame(scale(df, center = mins, scale = maxs - mins))
Is it possible to normalize my data using min-max scaling within preProc
?是否可以在
preProc
中使用 min-max 缩放来标准化我的数据?
And if so, how could I undo the scaling when predicting?如果是这样,我如何在预测时撤消缩放?
The three options c("center", "scale", "nzv") does scale and center, in the vignette :三个选项 c("center", "scale", "nzv") 在小插图中进行缩放和居中:
method = "center" subtracts the mean of the predictor's data (again from the data in x) from the predictor values while method = "scale" divides by the standard deviation.
method = "center" 从预测变量值中减去预测变量数据的平均值(再次从 x 中的数据),而 method = "scale" 除以标准差。
And nzv
basically excludes variables that have near zero variance, meaning they are almost constant and most likely not useful for prediction. nzv
基本上排除了方差接近于零的变量,这意味着它们几乎是恒定的,并且很可能对预测没有用处。 To do min max, there is an option:要做 min max,有一个选项:
The "range" transformation scales the data to be within 'rangeBounds'.
“范围”转换将数据缩放到“范围边界”内。 If new samples have values larger or smaller than those in the training set, values will be outside of this range.
如果新样本的值大于或小于训练集中的值,则值将超出此范围。
we try it below:我们在下面尝试:
library(mlbench)
data(BostonHousing)
library(caret)
idx = sample(nrow(BostonHousing),400)
df = BostonHousing[idx,]
df$chas = as.numeric(df$chas)
pre_mdl = preProcess(df,method="range")
nn <- train(medv ~ ., data = predict(pre_mdl,df),
method = "neuralnet",tuneGrid=G,
metric = "RMSE",trControl = trainControl(
method = "cv",number = 5,verboseIter = TRUE))
nn$preProcess
Created from 400 samples and 13 variables
Pre-processing:
- ignored (0)
- re-scaling to [0, 1] (13)
summary(nn$finalModel$data)
crim zn indus chas
Min. :0.000000 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.:0.000821 1st Qu.:0.0000 1st Qu.:0.1646 1st Qu.:0.0000
Median :0.002454 Median :0.0000 Median :0.2969 Median :0.0000
Mean :0.042130 Mean :0.1309 Mean :0.3804 Mean :0.0625
3rd Qu.:0.039150 3rd Qu.:0.2000 3rd Qu.:0.6466 3rd Qu.:0.0000
Max. :1.000000 Max. :1.0000 Max. :1.0000 Max. :1.0000
nox rm age dis
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.00000
1st Qu.:0.1276 1st Qu.:0.4470 1st Qu.:0.4032 1st Qu.:0.08522
Median :0.2819 Median :0.5076 Median :0.7503 Median :0.20133
Mean :0.3363 Mean :0.5232 Mean :0.6647 Mean :0.25146
3rd Qu.:0.4918 3rd Qu.:0.5880 3rd Qu.:0.9361 3rd Qu.:0.38622
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.00000
rad tax ptratio b
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.:0.1304 1st Qu.:0.1770 1st Qu.:0.5106 1st Qu.:0.9475
Median :0.1739 Median :0.2729 Median :0.6862 Median :0.9861
Mean :0.3676 Mean :0.4171 Mean :0.6243 Mean :0.8987
3rd Qu.:1.0000 3rd Qu.:0.9141 3rd Qu.:0.8085 3rd Qu.:0.9983
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
lstat .outcome
Min. :0.0000 Min. :0.0000
1st Qu.:0.1492 1st Qu.:0.2683
Median :0.2705 Median :0.3644
Mean :0.3069 Mean :0.3902
3rd Qu.:0.4220 3rd Qu.:0.4450
Max. :1.0000 Max. :1.0000
Not very sure what you mean by "undo the scaling when predicting".不太确定“预测时撤消缩放”是什么意思。 Maybe you meant translating them back to the original scale:
也许您的意思是将它们翻译回原始比例:
test = BostonHousing[-idx,]
test$chas = as.numeric(test$chas)
test_medv = test$medv
test = predict(pre_mdl,test)
The range is stored under the preProcess model, under范围存储在 preProcess model 下,在
pre_mdl$ranges
crim zn indus chas nox rm age dis rad tax ptratio b
[1,] 0.00632 0 0.46 1 0.385 3.561 2.9 1.1691 1 187 12.6 0.32
[2,] 88.97620 100 27.74 2 0.871 8.780 100.0 12.1265 24 711 22.0 396.90
lstat medv
[1,] 1.73 5
[2,] 36.98 50
So we write a wrapper:所以我们写了一个包装器:
convert_response = function(value,mdl,method,column){
bounds = mdl[[method]][,column]
value*diff(bounds) + min(bounds)
}
plot(test_medv,convert_response(predict(nn,test),pre_mdl,"ranges","medv"),
ylab="predicted")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.