简体   繁体   English

找到 R 数据帧中每列出现第二高值的次数百分比

[英]Locate % of times that the second highest value appears for each column in R data frame

I have a dataframe in R as follows:我在 R 中有一个R如下:

set.seed(123) 
df <- as.data.frame(matrix(rnorm(20*5,mean = 0,sd=1),20,5))

I want to find the percentage of times that the highest value of each row appears in each column, which I can do as follows:我想找到每一行的最大值出现在每一列中的次数百分比,我可以这样做:

A <- table(names(df)[max.col(df)])/nrow(df)

Then the percentage of times that the second highest value of each row appears in each column can be found as follows:那么每一行的第二高值在每一列中出现的次数百分比如下:

df2 <- as.data.frame(t(apply(df,1,function(r) {
r[which.max(r)] <- 0.001
return(r)})))
B <- table(names(df2)[max.col(df2)])/nrow(df2)

How can I calculate in R the following?如何在R中计算以下内容?

   C<- The percentage of times that the first and the second highest values 
appear in the first two columns of `df` simultaneously

I would do it like this:我会这样做:

# compute reverse rank
df.rank <- ncol(df) - t(apply(df, 1, rank)) + 1

A <- colMeans(df.rank == 1)
B <- colMeans(df.rank == 2)
C <- mean(apply(df.rank[, 1:2], 1, prod)==2)

First I compute reverse rank which is analogous to using decreasing=T with sort() or order() .首先,我计算反向排名,这类似于将decreasing=Tsort()order()一起使用。 A and B is then rather straightforward.那么A和B就相当简单了。 Please note that your original approach omits zeros for columns where no (second) maximum value appears which may cause problems in later usage.请注意,您的原始方法省略了没有出现(第二个)最大值的列的零,这可能会在以后的使用中导致问题。

For C, I take only first two columns of the rank matrix and compute their product for every row.对于 C,我只取秩矩阵的前两列并计算每一行的乘积。 If there are the two largest values in the first two columns the product has to be 2.如果前两列中有两个最大值,则乘积必须为 2。

Also, if ties might appear in your data set you should consider selecting the appropriate ties.method argument for rank .此外,如果关系可能出现在您的数据集中,您应该考虑为rank选择适当的ties.method参数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM