简体   繁体   中英

Finding top values in a table in R

I am wondering how to find the pair of variables in a table that give the highest values.

For instance, I have this file "mydata" with 5 numeric columns. If I run cor(mydata) it will show me all the possible correlations. I want to know those pairs that are highly correlated. I tried using sort(cor(mydata)) , but understandably this gives me a vector of the ordered values. How can I then know what pair is responsible for a certain value?

PS: I'm not sure how to insert an example, I tried posting pictures but don't have the necessary points ¬¬

Let's say that if I have a table with 2 variables A and B, the output of sorting would be:

[1] 0.5 0.5 1.0 1.0

In this case it's easy to know that 0.5 comes from the pair A and B, but how could I know this when more than 2 variables are involved?

I think which(..., arr.ind = TRUE) will help.

which can take a vector, matrix, or array as an argument. By default ( arr.ind = FALSE ), it simplifies the output into a vector, but if you instead set arr.ind = TRUE (and the data has a dim attribute, ie, matrix, data.frame, or array), it will honor the dimensionality of the source data and tell you more precisely where to find the desired elements.

set.seed(42)
dat <- matrix(rbinom(25, 5, 0.5), ncol = 5)
which(dat > 3, arr.ind = TRUE)
##       row col
##  [1,]   1   1
##  [2,]   2   1
##  [3,]   4   1
##  [4,]   3   3
##  [5,]   1   4
##  [6,]   2   4
##  [7,]   1   5
##  [8,]   3   5
##  [9,]   4   5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM