简体   繁体   中英

Locate % of times that the second highest value appears for each column in R data frame

I have a dataframe in R as follows:

set.seed(123) 
df <- as.data.frame(matrix(rnorm(20*5,mean = 0,sd=1),20,5))

I want to find the percentage of times that the highest value of each row appears in each column, which I can do as follows:

A <- table(names(df)[max.col(df)])/nrow(df)

Then the percentage of times that the second highest value of each row appears in each column can be found as follows:

df2 <- as.data.frame(t(apply(df,1,function(r) {
r[which.max(r)] <- 0.001
return(r)})))
B <- table(names(df2)[max.col(df2)])/nrow(df2)

How can I calculate in R the following?

   C<- The percentage of times that the first and the second highest values 
appear in the first two columns of `df` simultaneously

I would do it like this:

# compute reverse rank
df.rank <- ncol(df) - t(apply(df, 1, rank)) + 1

A <- colMeans(df.rank == 1)
B <- colMeans(df.rank == 2)
C <- mean(apply(df.rank[, 1:2], 1, prod)==2)

First I compute reverse rank which is analogous to using decreasing=T with sort() or order() . A and B is then rather straightforward. Please note that your original approach omits zeros for columns where no (second) maximum value appears which may cause problems in later usage.

For C, I take only first two columns of the rank matrix and compute their product for every row. If there are the two largest values in the first two columns the product has to be 2.

Also, if ties might appear in your data set you should consider selecting the appropriate ties.method argument for rank .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM