[英]Using R to find correlation pairs
VZ.Close CBOU.Close SBUX.Close T.Close
VZ.Close 1.0000000 0.5804478 0.8872978 0.9480894
CBOU.Close 0.5804478 1.0000000 0.7876277 0.4988890
SBUX.Close 0.8872978 0.7876277 1.0000000 0.8143305
T.Close 0.9480894 0.4988890 0.8143305 1.0000000
因此,假設我在股票價格之間具有這些相關性。 我想看看第一行,找到相關性最高的那對。 那將是VZ和T。然后,我想刪除這2只股票作為期權。 然后,在其余股票中找到相關性最高的貨幣對。 依此類推,直到所有股票配對。 在此示例中,顯然將是CBOU和SBUX,因為它們僅剩2個,但是我希望代碼能夠容納任意數量的對。
如果您想查看每一步的最大相關性,這是一個解決方案。 因此,第一步不僅要看第一行,還要看整個矩陣。
樣本數據 :
d <- matrix(runif(36),ncol=6,nrow=6)
rownames(d) <- colnames(d) <- LETTERS[1:6]
diag(d) <- 1
d
A B C D E F
A 1.00000000 0.65209204 0.8520392 0.26980214 0.5844000 0.69335143
B 0.73531603 1.00000000 0.5499431 0.60511580 0.7483990 0.14788134
C 0.56433218 0.27242769 1.0000000 0.07952776 0.2147628 0.03711562
D 0.91756919 0.04853523 0.5554490 1.00000000 0.4344089 0.23381447
E 0.06897889 0.80740821 0.7974340 0.87425643 1.0000000 0.74546072
F 0.19961474 0.61665231 0.2829632 0.58110694 0.7433924 1.00000000
和代碼:
results <- data.frame(v1=character(0), v2=character(0), cor=numeric(0), stringsAsFactors=FALSE)
diag(d) <- 0
while (sum(d>0)>1) {
maxval <- max(d)
max <- which(d==maxval, arr.ind=TRUE)[1,]
results <- rbind(results, data.frame(v1=rownames(d)[max[1]], v2=colnames(d)[max[2]], cor=maxval))
d[max[1],] <- 0
d[,max[1]] <- 0
d[max[2],] <- 0
d[,max[2]] <- 0
}
這使 :
v1 v2 cor
1 D A 0.9175692
2 E B 0.8074082
3 F C 0.2829632
我認為這可以回答您的問題,但是我不能確定,因為原始問題有點含糊...
# Construct toy example of symmentrical matrix
# nc is number of rows/columns in matrix, in the problem above it was 4, but let's try with 6
nc <- 6
mat <- diag( 1 , nc )
# Create toy correlation data for matrix
dat <- runif( ( (nc^2-nc)/2 ) )
# Fill both triangles of matrix so it is symmetric
mat[lower.tri( mat ) ] <- dat
mat[upper.tri( mat ) ] <- dat
# Create vector of random string names for row/column names
names <- replicate( nc , expr = paste( sample( c( letters , LETTERS ) , 3 , replace = TRUE ) , collapse = "" ) )
dimnames(mat) <- list( names , names )
# Sanity check
mat
SXK llq xFL RVW oYQ Seb
SXK 1.000 0.973 0.499 0.585 0.813 0.751
llq 0.973 1.000 0.075 0.533 0.794 0.826
xFL 0.499 0.099 1.000 0.099 0.481 0.968
RVW 0.075 0.813 0.620 1.000 0.620 0.307
oYQ 0.585 0.794 0.751 0.968 1.000 0.682
Seb 0.533 0.481 0.826 0.307 0.682 1.000
# Ok - to problem at hand , you can just substitute your matrix into these lines:
# Clearly the diagonal in a correlation matrix will be 1 so this is excluded as per your problem
diag( mat ) <- NA
# Now find the next highest correlation in each row and set this to NA
mat <- t( apply( mat , 1 , function(x) { x[ which.max(x) ] <- NA ; return(x) } ) )
# Another sanity check...!
mat
SXK llq xFL RVW oYQ Seb
SXK NA NA 0.499 0.585 0.813 0.751
llq NA NA 0.075 0.533 0.794 0.826
xFL 0.499 0.099 NA 0.099 0.481 NA
RVW 0.075 NA 0.620 NA 0.620 0.307
oYQ 0.585 0.794 0.751 NA NA 0.682
Seb 0.533 0.481 NA 0.307 0.682 NA
# Now return the two remaining columns with greatest correlation in that row
res <- t( apply( mat , 1 , function(x) { y <- names( sort(x , TRUE ) )[1:2] ; return( y ) } ) )
res
[,1] [,2]
SXK "oYQ" "Seb"
llq "Seb" "oYQ"
xFL "SXK" "oYQ"
RVW "xFL" "oYQ"
oYQ "llq" "xFL"
Seb "oYQ" "SXK"
這回答了你的問題了嗎?
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.