简体   繁体   English

如何规范化 R 中的数据

[英]How to normalize data in R

This is my data:这是我的数据:

a       b       c     d         e           f           g
<dbl>   <dbl>   <dbl> <dbl>     <dbl>       <dbl>       <dbl>
14.6    74529   720   4639.341  10039.323   0.3089194   0.00011135818
270.0   74529   720   4639.341  10039.323   0.3089194   0.00011135818
14.6    74529   720   4639.341  10039.323   0.3089194   0.00011135818
390.0   74529   720   4639.341  10039.323   0.3089194   0.00011135818
2000.0  74529   720   4639.341  10039.323   0.3089194   0.00011135818
2452.0  74529   720   4639.341  10039.323   0.3089194   0.00011135818
10315.0 74529   720   4639.341  10039.323   0.3089194   0.00011135818
190.6   74529   720   4639.341  10039.323   0.3089194   0.00011135818
1050.0  74529   720   4639.341  10039.323   0.3089194   0.00011135818
14.6    74529   720   4639.341  10039.323   0.3089194   0.00011135818
...

Let's say I want to create a new variable by performing addition on other variables.假设我想通过对其他变量执行加法来创建一个新变量。 However, since the variables are not at comparable scales, I need to rescale them.但是,由于变量没有可比的比例,我需要重新调整它们。 The distributions of the variables are not normal and the normalization process also should be robust to outliers.变量的分布不是正态的,标准化过程也应该对异常值具有鲁棒性。 So what is the best way to normalize data so that I can sum the variables to create a new parameter for my data?那么标准化数据的最佳方法是什么,以便我可以对变量求和以为我的数据创建一个新参数?

Use scale(x) .使用scale(x) To dispose of outliers, discard scaled values above a certain threshold, eg, which(abs(scale(x))>3) would point out data further away from the average than 3 sd要处理异常值,请丢弃高于某个阈值的缩放值,例如which(abs(scale(x))>3)将指出数据远离平均值而不是 3 sd

Do this for every column and form the union of all outliers to be discarded from all columns before you proceed.对每一列执行此操作,并在继续之前形成要从所有列中丢弃的所有异常值的并集。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM