简体   繁体   中英

Create a matrix from a function and two numeric data frames

I'm trying to create matrices of various distance/association functions in R. I have a function similar to cor that gives the association between two vectors. Now I want to take a dataframe (or matrix) of numeric vectors, something like mtcars , and create a matrix from the function and data frame. I thought this is what outer is for but am not getting it to work. Here's an attempt using cor and mtcars .

cor(mtcars$mpg, mtcars$cyl)  #a function that gives an association between two vectors                  
outer(mtcars, mtcars, "cor") #the attempt to create a matrix of all vectors in a df

Yes I know that cor can do this directly, let's pretend it can't. that cor just finds correlations between two vectors.

So the final goal would be to get the matrix you get from cor(mtcars) .

Thank you in advance.

You can use outer with a function that takes column names or column numbers as arguments.

outer(
  names(mtcars), 
  names(mtcars), 
  Vectorize(function(i,j) cor(mtcars[,i],mtcars[,j]))
)

outer is not directly up to the job. It will just expand its X and Y vectors and call cor once. EDIT As @Vincent Zoonekynd shows, you can adapt it to work.

Otherwise, a rather simple loop does the trick:

m <- as.matrix(mtcars)
r <- matrix(1, ncol(m), ncol(m), dimnames=list(colnames(m), colnames(m)))
for(i in 1:(ncol(m)-1)) {
  for(j in (i+1):ncol(m)) {
     r[i,j] <- cor(m[,i], m[,j])
     r[j,i] <- r[i,j]
  }
}

all.equal(r, cor(m)) # Sanity check...

r # print resulting 11x11 correlation matrix

...Here I assume your correlation is symmetric and cor(x,x) == 1 .

UPDATE Since Vincent's solution is so much more elegant, I have to counter with the fact that mine is 2x faster :-)

# Huge data frame (1e6 rows, 10 cols)
d <- data.frame(matrix(1:1e7, ncol=10))

# Vincent's solution    
system.time(outer(
  names(d), 
  names(d), 
  r <- Vectorize(function(i,j) cor(d[,i],d[,j]))
)) # 2.25 secs

# My solution    
system.time({
m <- d
r <- matrix(1, ncol(m), ncol(m), dimnames=list(colnames(m), colnames(m)))
for(i in 1:(ncol(m)-1)) {
  for(j in (i+1):ncol(m)) {
     r[i,j] <- cor(m[,i], m[,j])
     r[j,i] <- r[i,j]
  }
}
}) # 1.0 secs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM