简体   繁体   中英

filter a correlation matrix based on value and occurrence

Does anyone have a way to filter a correlation matrix (or list of correlations) based on a ranking that includes value and breadth? For example, if a certain variable has a high enough correlation with a large enough number of other variables, then keep it. If a variable does not meet these criteria, filter it out.

as an example: if a correlation > 0.25 is found in > 3 entries, keep this variable. If not, discard the variable.

Currently I'm able to construct a correlation matrix and filter it based on values, but have not been able to progress past this. For filtering, I'm setting values below my threshold to 0

correlation_matrix <- round(cor(data, method = "pearson", use = "pairwise.complete.obs"), digits = 4)
correlation_matrix[correlation_matrix < 0.13 & correlation_matrix > -0.13] <- 0

I've now done this using apply as Rui mentioned above.

This is code to select all rows (and columns) in the correlation matrix that contain at least 75 (breadth) values over 0.2 (threshold):

1) define variables; set diagonal values from 1 to 0

threshold <- 0.2
breadth <- 75
correlation_matrix_filter <- correlation_matrix
diag(correlation_matrix_filter) <- 0

2) count how many values per row are greater than the threshold of 0.2

filter <- apply(correlation_matrix_filter,1, function(x) sum(abs(x) >= threshold))

3) select only rows containing 75 values greater than the threshold; subset the original correlation matrix to only include these rows (and columns)

sel <- filter >= breadth
correlation_matrix_final <- correlation_matrix[sel,sel]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM