[英]Find first, second, and third maximum of each row and their corresponding column names in a data frame in r
I am trying to find first max, second max, and third max value and corresponding col names for each row, but unable to do that in r. 我正在尝试为每一行查找第一最大值,第二最大值和第三最大值以及对应的col名称,但是无法在r中做到这一点。 Please help. 请帮忙。
Here is how the dataframe looks like: 数据框如下所示:
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
10003 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
10006 0.0 0.0 0.0 0.0 0.0 0.0 16.7 0.0 0.0 0.0 0.0 0.0
10007 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
10008 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
10010 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
10014 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
This is the sample data you posted in your comment: 这是您在评论中发布的样本数据:
data <-read.table(text=" x1 x2 x3 x4 x5 x6 x7 x8 x9
1003 0 45.7 0 22.9 0 13.7 0 0 23.1
1004 22.2 0 13.2 0 5.4 0 9.7 0 0
1005 0 0 0 12 2.1 0 0 3.2 0
1006 1.2 0 1.2 0 43.9 43.9 0 0 57.6",
header=T)
You can use dplyr
and tidyverse
to acheive this. 您可以使用dplyr
和tidyverse
来实现。
The following code will give you the maximum three columns across all the rows: 以下代码将为您提供所有行中最多三列的信息:
library(dplyr)
library(tidyverse)
data %>%
rownames_to_column() %>%
gather(column, value, -rowname) %>%
group_by(rowname) %>%
arrange(desc(value)) %>%
head(3)
This will give you the following result: 这将为您提供以下结果:
# A tibble: 3 x 3
# Groups: rowname [3]
# rowname column value
# <chr> <chr> <dbl>
# 1 1006 x9 57.6
# 2 1003 x2 45.7
# 3 1006 x5 43.9
If you want to get the maximum three values for each row, you can do it as follows: 如果要获取每一行的最大三个值,可以按以下步骤进行操作:
result <- data %>%
rownames_to_column() %>%
gather(column, value, -rowname) %>%
group_by(rowname) %>%
mutate(max = rank(-value)) %>%
filter(max <= 3) %>%
arrange(rowname, max)
Which will give you the following result: 这将为您带来以下结果:
# A tibble: 12 x 4
# Groups: rowname [4]
# rowname column value max
# <chr> <chr> <dbl> <dbl>
# 1 1003 x2 45.7 1
# 2 1003 x9 23.1 2
# 3 1003 x4 22.9 3
# 4 1004 x1 22.2 1
# 5 1004 x3 13.2 2
# 6 1004 x7 9.7 3
# 7 1005 x4 12 1
# 8 1005 x8 3.2 2
# 9 1005 x5 2.1 3
# 10 1006 x9 57.6 1
# 11 1006 x5 43.9 2.5
# 12 1006 x6 43.9 2.5
To summarize the result for each row, use the following code: 要总结每一行的结果,请使用以下代码:
result %>%
mutate(result = paste0(column, "=", value, collapse = ", ")) %>%
select(result) %>%
distinct()
Which will give you the following result: 这将为您带来以下结果:
# A tibble: 4 x 2
# Groups: rowname [4]
# rowname result
# <chr> <chr>
# 1 1003 x2=45.7, x9=23.1, x4=22.9
# 2 1004 x1=22.2, x3=13.2, x7=9.7
# 3 1005 x4=12, x8=3.2, x5=2.1
# 4 1006 x9=57.6, x5=43.9, x6=43.9
Hope it helps. 希望能帮助到你。
Here is my approach: 这是我的方法:
# Make up data because yours is pretty unreproducible:
df <- data.frame(X1=1:5, X2=c(3,5,1,6,7))
# combine and sort the data by decreasing value:
a <- sort(dplyr::combine(df), decreasing = T)[1:3]
# For loop to get the indexes:
for(i in 1:length(a)){
print(which(df==a[i], arr.ind = T))
}
This will give you what you need. 这将为您提供所需的东西。 Replace print
with whatever you want to do (eg assign or whatever you need) 将print
内容替换为您想要的任何内容(例如分配或所需的任何内容)
You can use 您可以使用
max.names = apply(data, 1, function(x) names(sort(x, decreasing = T)[1:3]))
max.vals = apply(data, 1, function(x) sort(x, decreasing = T)[1:3])
data = cbind(data, t(max.names), t(max.vals))
# x1 x2 x3 x4 x5 x6 x7 x8 x9 1 2 3 1 2 3
# 1003 0.0 45.7 0.0 22.9 0.0 13.7 0.0 0.0 23.1 x2 x9 x4 45.7 23.1 22.9
# 1004 22.2 0.0 13.2 0.0 5.4 0.0 9.7 0.0 0.0 x1 x3 x7 22.2 13.2 9.7
# 1005 0.0 0.0 0.0 12.0 2.1 0.0 0.0 3.2 0.0 x4 x8 x5 12.0 3.2 2.1
# 1006 1.2 0.0 1.2 0.0 43.9 43.9 0.0 0.0 57.6 x9 x5 x6 57.6 43.9 43.9
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.