简体   繁体   中英

Correlation with discrete and categoric variables in R

I am analyzing this dataset it has numeric and factor variable. I would like to know the correlation so I can choose the best variables.

str(data)
$ Ag                    : num [1:1470] 41 49 37 33 27 32 59 30 38 36 ...
 $ Ay              : Factor w/ 2 levels "No","Yes": 2 1 2 1 1 1 1 1 1 1 ...
 $ Bu        : Factor w/ 3 levels "Non-Travel","Travel_Frequently",..: 3 2 3 2 3 2 3 3 2 3 ...
 $ Di       : num [1:1470] 1 8 2 3 2 2 3 24 23 27 ...
 $ Ed               : num [1:1470] 2 1 2 4 1 2 3 1 3 3 ...
 $ Ep          : num [1:1470] 1 1 1 1 1 1 1 1 1 1 ...
 $ Em          : num [1:1470] 1 2 4 5 7 8 10 11 12 13 ...
 $ Ge                : Factor w/ 2 levels "Female","Male": 1 2 2 1 2 2 1 2 2 2 ...
 $ Ho             : num [1:1470] 94 61 92 56 40 79 81 67 44 94 ...
 $ J1         : num [1:1470] 3 2 2 3 3 3 4 3 2 3 ...
 $ J2               : num [1:1470] 2 2 1 1 1 1 1 1 3 2 ...

When I execute this(althought I want correlations of all data not only numeric):

cor(data[sapply(data, is.numeric)])

I return this message:

Warning message:
In cor(data[sapply(data, is.numeric)]) :
  the standard deviation is zero

It just politely lets you know that you set out to calculate correlation where one of the variables is constant. This often pointless.

Just filter that out aswell


x1 <- data[sapply(data,is.numeric)]
x2 <- x1[sapply(x1,sd)!=0]

cor(x2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM