This is my data:
a b c d e f g
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
14.6 74529 720 4639.341 10039.323 0.3089194 0.00011135818
270.0 74529 720 4639.341 10039.323 0.3089194 0.00011135818
14.6 74529 720 4639.341 10039.323 0.3089194 0.00011135818
390.0 74529 720 4639.341 10039.323 0.3089194 0.00011135818
2000.0 74529 720 4639.341 10039.323 0.3089194 0.00011135818
2452.0 74529 720 4639.341 10039.323 0.3089194 0.00011135818
10315.0 74529 720 4639.341 10039.323 0.3089194 0.00011135818
190.6 74529 720 4639.341 10039.323 0.3089194 0.00011135818
1050.0 74529 720 4639.341 10039.323 0.3089194 0.00011135818
14.6 74529 720 4639.341 10039.323 0.3089194 0.00011135818
...
Let's say I want to create a new variable by performing addition on other variables. However, since the variables are not at comparable scales, I need to rescale them. The distributions of the variables are not normal and the normalization process also should be robust to outliers. So what is the best way to normalize data so that I can sum the variables to create a new parameter for my data?
Use scale(x)
. To dispose of outliers, discard scaled values above a certain threshold, eg, which(abs(scale(x))>3)
would point out data further away from the average than 3 sd
Do this for every column and form the union of all outliers to be discarded from all columns before you proceed.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.