Locate % of times that the second highest value appears for each column in R data frame

Question

I have a dataframe in R as follows:

set.seed(123) 
df <- as.data.frame(matrix(rnorm(20*5,mean = 0,sd=1),20,5))

I want to find the percentage of times that the highest value of each row appears in each column, which I can do as follows:

A <- table(names(df)[max.col(df)])/nrow(df)

Then the percentage of times that the second highest value of each row appears in each column can be found as follows:

df2 <- as.data.frame(t(apply(df,1,function(r) {
r[which.max(r)] <- 0.001
return(r)})))
B <- table(names(df2)[max.col(df2)])/nrow(df2)

How can I calculate in R the following?

   C<- The percentage of times that the first and the second highest values 
appear in the first two columns of `df` simultaneously

Answer 1

I would do it like this:

# compute reverse rank
df.rank <- ncol(df) - t(apply(df, 1, rank)) + 1

A <- colMeans(df.rank == 1)
B <- colMeans(df.rank == 2)
C <- mean(apply(df.rank[, 1:2], 1, prod)==2)

First I compute reverse rank which is analogous to using decreasing=T with sort() or order() . A and B is then rather straightforward. Please note that your original approach omits zeros for columns where no (second) maximum value appears which may cause problems in later usage.

For C, I take only first two columns of the rank matrix and compute their product for every row. If there are the two largest values in the first two columns the product has to be 2.

Also, if ties might appear in your data set you should consider selecting the appropriate ties.method argument for rank .

Locate % of times that the second highest value appears for each column in R data frame

Question

1 answers

solution1
1 ACCPTED 2022-08-07 15:19:03

Locate % of times that the second highest value appears for each column in R data frame

Question

1 answers

solution1 1 ACCPTED 2022-08-07 15:19:03

solution1
1 ACCPTED 2022-08-07 15:19:03