简体   繁体   English

R中的Cor函数仅产生NA值

[英]Cor function in R producing only NA value

I'm attempting to complete the second week coursera R course assignment & cannot figure out what is wrong with my code, as I get NA values when I run it. 我正在尝试完成第二周的Coursera R课程分配,并且无法弄清楚我的代码出了什么问题,因为我在运行它时会得到NA值。 The goal is to calculate the correlation between sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greater than the threshold. 目的是为监测位置(在所有变量上)完全观察到的病例数大于阈值的位置计算硫酸盐和硝酸盐之间的相关性。 The function should return a vector of correlations for the monitors that meet the threshold requirement. 该函数应为满足阈值要求的监视器返回相关向量。 I've made some basic debugging attempts (using print statements), and believe my issue is somewhere in the if statement. 我进行了一些基本的调试尝试(使用print语句),并认为我的问题在if语句中。 Any thoughts/help/suggestions? 有什么想法/帮助/建议吗?

The data files for the code are here: https://d396qusza40orc.cloudfront.net/rprog%2Fdata%2Fspecdata.zip 该代码的数据文件位于: https : //d396qusza40orc.cloudfront.net/rprog%2Fdata%2Fspecdata.zip

Code is below: 代码如下:

corr <- function(directory, threshold = 0) { 

    #list of all csv files
    filelist <- list.files(path = directory, pattern = ".csv", full.names = TRUE)

    #vector for values to be input into
    cor_vector <- numeric()

    #loop for each file in list
    for (i in 1:length(filelist)) {
            data <- read.csv(filelist[i])
            cc <- sum(complete.cases(data))

            if (cc > threshold){

                    compsulfate <- data[which(!is.na(data$sulfate)), ]
                    compnitrate <- data[which(!is.na(data$nitrate)), ]

                    cor_vector <- c(cor_vector, cor(data$sulfate,data$nitrate))

            }

    }

    return(cor_vector)
}

One of the issue is that objects created 'compsulfate', 'compnitrate' are not used in the correlation calculation, but individually removing the NA elements can also result in changes in length . 问题之一是在关联计算中不使用创建为'compsulfate','compnitrate'的对象,但是单独删除NA元素也会导致length变化。 A better option would be to remove the NA in both columns and then do the cor 更好的选择是删除两列中的NA,然后执行cor

corr <- function(directory, threshold = 0) { 

    #list of all csv files
    filelist <- list.files(path = directory, pattern = ".csv", full.names = TRUE)

    #vector for values to be input into
    cor_vector <- numeric()

    #loop for each file in list
    for (i in 1:length(filelist)) {
            data <- read.csv(filelist[i])

            cc <- sum(complete.cases(data))

            if (cc > threshold){

                     dataN <- data[complete.cases(data[c('sulfate', 'nitrate')]),]

                    cor_vector <- c(cor_vector, cor(dataN$sulfate, dataN$nitrate))

            }

    }

    return(cor_vector)

    }

dir1 <- "/home/akrun/Downloads/specdata/"    
out <-  corr(dir1, 0)
head(out)
#[1] -0.22255256 -0.01895754 -0.14051254 -0.04389737 -0.06815956 -0.12350667 

The function 'cor' has a 'use' parameter that can be set to 'pairwise.complete.obs' 函数“ cor”具有一个“使用”参数,可以将其设置为“ pairwise.complete.obs”

corr <- function(directory, threshold = 0) { 

#list of all csv files
filelist <- list.files(path = directory, pattern = ".csv", full.names = TRUE)

#vector for values to be input into
cor_vector <- numeric()

#loop for each file in list
for (i in 1:length(filelist)) {
        data <- read.csv(filelist[i])
        cc <- sum(complete.cases(data))

        if (cc > threshold){

#                 compsulfate <- data[which(!is.na(data$sulfate)), ]
#                 compnitrate <- data[which(!is.na(data$nitrate)), ]

                cor_vector <- c(cor_vector, cor(data$sulfate,data$nitrate, use= 
                'pairwise.complete.obs'))

        }

}

return(cor_vector)
}

R> summary(foo)
 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-1.00   -0.05    0.11    0.14    0.28    1.00 

R> stem(foo)

The decimal point is 1 digit(s) to the left of the |

-10 | 0
 -9 | 
 -8 | 
 -7 | 
 -6 | 
 -5 | 5
 -4 | 6
 -3 | 0
 -2 | 9222211
 -1 | 88888888776666665544444332222111111110
 -0 | 99999999888888887777777666666665555544444444433333322222221111110000
  0 | 00111111222222223444445566677779999999
  1 | 000000111122222222233333344444455555666678888899999
  2 | 00001111333334444555566666667777788888999
  3 | 0000001222455566667778889
  4 | 011123334456677789
  5 | 12222355678999
  6 | 0012227
  7 | 011233669
  8 | 9
  9 | 1
 10 | 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM