简体   繁体   中英

Cor function in R producing only NA value

I'm attempting to complete the second week coursera R course assignment & cannot figure out what is wrong with my code, as I get NA values when I run it. The goal is to calculate the correlation between sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greater than the threshold. The function should return a vector of correlations for the monitors that meet the threshold requirement. I've made some basic debugging attempts (using print statements), and believe my issue is somewhere in the if statement. Any thoughts/help/suggestions?

The data files for the code are here: https://d396qusza40orc.cloudfront.net/rprog%2Fdata%2Fspecdata.zip

Code is below:

corr <- function(directory, threshold = 0) { 

    #list of all csv files
    filelist <- list.files(path = directory, pattern = ".csv", full.names = TRUE)

    #vector for values to be input into
    cor_vector <- numeric()

    #loop for each file in list
    for (i in 1:length(filelist)) {
            data <- read.csv(filelist[i])
            cc <- sum(complete.cases(data))

            if (cc > threshold){

                    compsulfate <- data[which(!is.na(data$sulfate)), ]
                    compnitrate <- data[which(!is.na(data$nitrate)), ]

                    cor_vector <- c(cor_vector, cor(data$sulfate,data$nitrate))

            }

    }

    return(cor_vector)
}

One of the issue is that objects created 'compsulfate', 'compnitrate' are not used in the correlation calculation, but individually removing the NA elements can also result in changes in length . A better option would be to remove the NA in both columns and then do the cor

corr <- function(directory, threshold = 0) { 

    #list of all csv files
    filelist <- list.files(path = directory, pattern = ".csv", full.names = TRUE)

    #vector for values to be input into
    cor_vector <- numeric()

    #loop for each file in list
    for (i in 1:length(filelist)) {
            data <- read.csv(filelist[i])

            cc <- sum(complete.cases(data))

            if (cc > threshold){

                     dataN <- data[complete.cases(data[c('sulfate', 'nitrate')]),]

                    cor_vector <- c(cor_vector, cor(dataN$sulfate, dataN$nitrate))

            }

    }

    return(cor_vector)

    }

dir1 <- "/home/akrun/Downloads/specdata/"    
out <-  corr(dir1, 0)
head(out)
#[1] -0.22255256 -0.01895754 -0.14051254 -0.04389737 -0.06815956 -0.12350667 

The function 'cor' has a 'use' parameter that can be set to 'pairwise.complete.obs'

corr <- function(directory, threshold = 0) { 

#list of all csv files
filelist <- list.files(path = directory, pattern = ".csv", full.names = TRUE)

#vector for values to be input into
cor_vector <- numeric()

#loop for each file in list
for (i in 1:length(filelist)) {
        data <- read.csv(filelist[i])
        cc <- sum(complete.cases(data))

        if (cc > threshold){

#                 compsulfate <- data[which(!is.na(data$sulfate)), ]
#                 compnitrate <- data[which(!is.na(data$nitrate)), ]

                cor_vector <- c(cor_vector, cor(data$sulfate,data$nitrate, use= 
                'pairwise.complete.obs'))

        }

}

return(cor_vector)
}

R> summary(foo)
 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-1.00   -0.05    0.11    0.14    0.28    1.00 

R> stem(foo)

The decimal point is 1 digit(s) to the left of the |

-10 | 0
 -9 | 
 -8 | 
 -7 | 
 -6 | 
 -5 | 5
 -4 | 6
 -3 | 0
 -2 | 9222211
 -1 | 88888888776666665544444332222111111110
 -0 | 99999999888888887777777666666665555544444444433333322222221111110000
  0 | 00111111222222223444445566677779999999
  1 | 000000111122222222233333344444455555666678888899999
  2 | 00001111333334444555566666667777788888999
  3 | 0000001222455566667778889
  4 | 011123334456677789
  5 | 12222355678999
  6 | 0012227
  7 | 011233669
  8 | 9
  9 | 1
 10 | 0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM