I've got a huge data set with six columns (call them A, B, C, D, E, F), about 450,000 rows. I simply tried to find the correlation between columns A
and B
:
cor(A, B)
and I got
[1] NA
as a result. What can I do to fix this problem?
Try cor(A,B, use = "pairwise.complete.obs")
. That will ignore the NAs in your observations.
To be statistically rigorous, you should also look at the # of missing entries in your data and look at whether the missing at random assumption holds.
Edit 1: Take a look at ?cor
to see other options for the use
parameter.
You might consider using the rcorr function in the Hmisc package.
It is very fast, and only includes pairwise complete observations. The returned object contains a matrix
Some example code is available here :
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.