简体   繁体   中英

Cor function returning single value instead of multiple in R language

I am trying to apply cor function to a data set. Below is my code:

corr <- function(directory, threshold = 0) {
      for (i in 1:332) {
      data = read.csv(paste(directory, '/',
          formatC(i, width = 3, flag = '0'), '.csv', sep = '')) # reading all files
      }
      cv = numeric() #initializing list
      data = na.omit(data) #omitting NAs from read file
      if (nrow(data) > threshold) { 
          cv = c(cv, cor(data[,2], data[,3])) #if number of rows more than threshold, get correlation of data
      }
     cv
 }

In command line, I can then call:

cr <- corr('specdata', 150)
head(cr)

My expected output is:

[1] -0.01896 -0.14051 -0.04390 -0.06816 -0.12351 -0.07589

but the return value I get is only:

[1] -0.01896

I don't fully understand cor and why I am getting this result, please help. All my CSV files contain normal tables. Thank you!

For two vectors x and y, cor(x,y) returns the correlation coefficient of x and y, which is just a single number. This is what your code is doing.

cor(1:10, 2:11) # returns 1.0

If you want more correlations, you need to send in a dataframe which contains your variables. For a dataframe 'df' with (say) 3 columns, then cor(df) will return a 3-by-3 matrix.

df <- data.frame(a=1:3, b=c(3,2,8), c=c(12,3,8))

cor(df)
       a         b          c
a  1.0000000 0.7777138 -0.4435328
b  0.7777138 1.0000000  0.2184630
c -0.4435328 0.2184630  1.0000000

You have added a for loop in your edit. It seems you're trying to return correlation constant for every csv in directory .

We can try something like this.

df1 <- data.frame(x = rnorm(10), y = rnorm(10))
df2 <- data.frame(x = rnorm(10), y = rnorm(10))
df3 <- data.frame(x = rnorm(10), y = rnorm(10))

write.csv(df1, "1.csv")
write.csv(df2, "2.csv")
write.csv(df3, "3.csv")

corr <- function(directory){
    temp = list.files(path = directory, pattern = "[0-9]+.csv")
    # in your case
    # temp = list.files(path = directory, pattern = "[0-9]{3}.csv")
    dat = lapply(temp, function(x){read.csv(x, header = T)})
    corlist <- lapply(dat, function(x){cor(cor(x[,1], x[,2]))})
    unlist(corlist)
}

corr(".")

0.07766259 0.24449723 0.20367101

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM