简体   繁体   中英

What is the correct syntax to standardize only some of the variables in the dataset (R)?

At first I tried:

Bank_sc <- preProcess(x = Bank,
                      method = c("center", "scale"), 
                      select=c(Age, Experience, Income, Family, CCAvg, Education, Mortgage))

I have omitted one variable here, but it was standardized nonetheless. I cannot find any articles on the proper syntax to do this so please help.

You can use dplyr package

data <- data.frame(x= sample(1:100, 30), y = sample(1:100, 30), z= sample(1:100, 30))

head(data)
   x  y  z
1 26 60 16
2 38 52 51
3 12 25 13
4 32 78 54
5  6 71 59
6 10 83  3

library(dplyr)

data <- data %>% mutate_at(vars(x, y), scale)

head(data)
           x          y  z
1 -0.6630489  0.1550407 16
2 -0.2522096 -0.1088584 51
3 -1.1423613 -0.9995179 13
4 -0.4576293  0.7488137 54
5 -1.3477809  0.5179020 59
6 -1.2108345  0.9137507  3

Just scale the subset. No package needed. Example:

to_scale <- c('X2', 'X3')
dat[to_scale] <- scale(dat[to_scale])
dat
#         X1         X2         X3
# 1 19.14806  0.7135746  0.8318253
# 2 19.37075 -1.8396140 -0.9339183
# 3 12.86140  0.3759499 -0.3961600
# 4 18.30448  0.5798603  0.8457129
# 5 16.41746 -0.4692167  0.9450476
# 6 15.19096  0.6394459 -1.2925074

Data:

set.seed(42)
dat <- data.frame(matrix(runif(30, 10, 20), 6, 3))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM