I am wondering how to find the pair of variables in a table that give the highest values.
For instance, I have this file "mydata" with 5 numeric columns. If I run cor(mydata)
it will show me all the possible correlations. I want to know those pairs that are highly correlated. I tried using sort(cor(mydata))
, but understandably this gives me a vector of the ordered values. How can I then know what pair is responsible for a certain value?
PS: I'm not sure how to insert an example, I tried posting pictures but don't have the necessary points ¬¬
Let's say that if I have a table with 2 variables A and B, the output of sorting would be:
[1] 0.5 0.5 1.0 1.0
In this case it's easy to know that 0.5 comes from the pair A and B, but how could I know this when more than 2 variables are involved?
I think which(..., arr.ind = TRUE)
will help.
which
can take a vector, matrix, or array as an argument. By default ( arr.ind = FALSE
), it simplifies the output into a vector, but if you instead set arr.ind = TRUE
(and the data has a dim
attribute, ie, matrix, data.frame, or array), it will honor the dimensionality of the source data and tell you more precisely where to find the desired elements.
set.seed(42)
dat <- matrix(rbinom(25, 5, 0.5), ncol = 5)
which(dat > 3, arr.ind = TRUE)
## row col
## [1,] 1 1
## [2,] 2 1
## [3,] 4 1
## [4,] 3 3
## [5,] 1 4
## [6,] 2 4
## [7,] 1 5
## [8,] 3 5
## [9,] 4 5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.