簡體   English   中英

使用R查找相關對

[英]Using R to find correlation pairs

           VZ.Close CBOU.Close SBUX.Close   T.Close
VZ.Close   1.0000000  0.5804478  0.8872978 0.9480894
CBOU.Close 0.5804478  1.0000000  0.7876277 0.4988890
SBUX.Close 0.8872978  0.7876277  1.0000000 0.8143305
T.Close    0.9480894  0.4988890  0.8143305 1.0000000

因此,假設我在股票價格之間具有這些相關性。 我想看看第一行,找到相關性最高的那對。 那將是VZ和T。然后,我想刪除這2只股票作為期權。 然后,在其余股票中找到相關性最高的貨幣對。 依此類推,直到所有股票配對。 在此示例中,顯然將是CBOU和SBUX,因為它們僅剩2個,但是我希望代碼能夠容納任意數量的對。

如果您想查看每一步的最大相關性,這是一個解決方案。 因此,第一步不僅要看第一行,還要看整個矩陣。

樣本數據 :

d <- matrix(runif(36),ncol=6,nrow=6)
rownames(d) <- colnames(d) <- LETTERS[1:6]
diag(d) <- 1
d
           A          B         C          D         E          F
A 1.00000000 0.65209204 0.8520392 0.26980214 0.5844000 0.69335143
B 0.73531603 1.00000000 0.5499431 0.60511580 0.7483990 0.14788134
C 0.56433218 0.27242769 1.0000000 0.07952776 0.2147628 0.03711562
D 0.91756919 0.04853523 0.5554490 1.00000000 0.4344089 0.23381447
E 0.06897889 0.80740821 0.7974340 0.87425643 1.0000000 0.74546072
F 0.19961474 0.61665231 0.2829632 0.58110694 0.7433924 1.00000000

和代碼:

results <- data.frame(v1=character(0), v2=character(0), cor=numeric(0), stringsAsFactors=FALSE)
diag(d) <- 0
while (sum(d>0)>1) {
  maxval <- max(d)
  max <- which(d==maxval, arr.ind=TRUE)[1,]
  results <- rbind(results, data.frame(v1=rownames(d)[max[1]], v2=colnames(d)[max[2]], cor=maxval))
  d[max[1],] <- 0
  d[,max[1]] <- 0
  d[max[2],] <- 0
  d[,max[2]] <- 0
}

這使 :

  v1 v2       cor
1  D  A 0.9175692
2  E  B 0.8074082
3  F  C 0.2829632

我認為這可以回答您的問題,但是我不能確定,因為原始問題有點含糊...

# Construct toy example of symmentrical matrix
# nc is number of rows/columns in matrix, in the problem above it was 4, but let's try with 6
nc <- 6
mat <- diag( 1 , nc )
# Create toy correlation data for matrix
dat <- runif( ( (nc^2-nc)/2 ) )
# Fill both triangles of matrix so it is symmetric
mat[lower.tri( mat ) ] <- dat 
mat[upper.tri( mat ) ] <- dat

# Create vector of random string names for row/column names
names <- replicate( nc , expr = paste( sample( c( letters , LETTERS ) , 3 , replace = TRUE ) , collapse = "" ) )
dimnames(mat) <- list( names , names )

# Sanity check
mat
    SXK   llq   xFL   RVW   oYQ   Seb
SXK 1.000 0.973 0.499 0.585 0.813 0.751
llq 0.973 1.000 0.075 0.533 0.794 0.826
xFL 0.499 0.099 1.000 0.099 0.481 0.968
RVW 0.075 0.813 0.620 1.000 0.620 0.307
oYQ 0.585 0.794 0.751 0.968 1.000 0.682
Seb 0.533 0.481 0.826 0.307 0.682 1.000

# Ok - to problem at hand , you can just substitute your matrix into these lines:
# Clearly the diagonal in a correlation matrix will be 1 so this is excluded as per your problem
diag( mat ) <- NA
# Now find the next highest correlation in each row and set this to NA
mat <- t( apply( mat , 1 , function(x) { x[ which.max(x) ] <- NA ; return(x) } ) ) 

# Another sanity check...!
mat

      SXK   llq   xFL   RVW   oYQ   Seb
SXK    NA    NA 0.499 0.585 0.813 0.751
llq    NA    NA 0.075 0.533 0.794 0.826
xFL 0.499 0.099    NA 0.099 0.481    NA
RVW 0.075    NA 0.620    NA 0.620 0.307
oYQ 0.585 0.794 0.751    NA    NA 0.682
Seb 0.533 0.481    NA 0.307 0.682    NA


# Now return the two remaining columns with greatest correlation in that row
res <- t( apply( mat , 1 , function(x) { y <- names( sort(x , TRUE ) )[1:2] ; return( y ) } ) )

res


[,1]  [,2] 
SXK "oYQ" "Seb"
llq "Seb" "oYQ"
xFL "SXK" "oYQ"
RVW "xFL" "oYQ"
oYQ "llq" "xFL"
Seb "oYQ" "SXK"

這回答了你的問題了嗎?

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM