Find top N highest values with column names in R

Question

Here is the sample of the data I'm using in the analysis. What I need to do is to extract top 3 values for each of the rows, with column names. For example, this would be an output for the first 3 rows:

id, group1, weight1, group2, weight2, group3, weight3
1, V4, 0.277991043, V10, 0.050863724, V2, 0.033589251
2, V5, 0.164107486, V4, 0.119961612, V3, 0.098208573
3, V3, 0.124760077, V5, 0.089891235, V2, 0.071337172

What would be an easiest way to do so?

Answer 1

Here's another idea that would keep the data in a tidy format:

library(dplyr)
library(tidyr)

sample %>%
  gather(key, value, -node) %>%
  group_by(node) %>%
  top_n(3) %>%
  # here we use arrange() to sort by node and value
  arrange(node, desc(value))

Which gives:

#Source: local data frame [75 x 3]
#Groups: node [25]
#
#    node   key      value
#   <int> <chr>      <dbl>
#1      1    V4 0.27799104
#2      1   V10 0.05086372
#3      1    V2 0.03358925
#4      2    V5 0.16410749
#5      2    V4 0.11996161
#6      2    V3 0.09820857
#7      3    V3 0.12476008
#8      3    V5 0.08989123
#9      3    V2 0.07133717
#10     4    V6 0.20665387
#..   ...   ...        ...

Should you really want to achieve your desired output, you could do:

sample %>%
  gather(key, value, -node) %>%
  group_by(node) %>%
  top_n(3) %>%
  arrange(node, desc(value)) %>%
  mutate(group  = paste0("group", row_number()),
         weight = paste0("weight", row_number())) %>%
  spread(group, value) %>%
  spread(weight, key) %>%
  summarise_each(funs(max(., na.rm = TRUE)))

Which gives:

#Source: local data frame [25 x 7]
#
#    node    group1     group2      group3 weight1 weight2 weight3
#   <int>     <dbl>      <dbl>       <dbl>   <chr>   <chr>   <chr>
#1      1 0.2779910 0.05086372 0.033589251      V4     V10      V2
#2      2 0.1641075 0.11996161 0.098208573      V5      V4      V3
#3      3 0.1247601 0.08989123 0.071337172      V3      V5      V2
#4      4 0.2066539 0.14747281 0.121561100      V6      V2     V10
#5      5 0.2773512 0.21849008 0.158989123      V1      V8      V3
#6      6 0.1509917 0.11964171 0.117722329      V9      V3     V10
#7      7 0.2415227 0.13595649 0.130838132      V9      V7      V8
#8      8 0.1090851 0.10588612 0.088611644      V9      V7      V5
#9      9 0.1868202 0.11548305 0.089571337     V10      V1      V6
#10    10 0.3429303 0.12955854 0.003838772      V5      V6     V11
#..   ...       ...        ...         ...     ...     ...     ...

Answer 2

We can use apply

res <- cbind(df1[1], t(apply(df1[-1], 1, function(x) {
         i1 <- order(-x)
          c(rbind(names(df1)[-1][i1][1:3], x[i1][1:3]))}
        )))

Then, we can do the type conversion

res[] <- lapply(res, function(x) {x1 <- type.convert(as.character(x))
               if(is.factor(x1)) as.character(x1) else x1})
names(res)[-1] <- make.unique(rep(c("group", "weight"), (ncol(res)-1)/2))

Find top N highest values with column names in R

Question

2 answers

solution1
2 2016-06-13 14:26:23

solution2
0 ACCPTED 2016-06-13 12:34:40

Find top N highest values with column names in R

Question

2 answers

solution1 2 2016-06-13 14:26:23

solution2 0 ACCPTED 2016-06-13 12:34:40

solution1
2 2016-06-13 14:26:23

solution2
0 ACCPTED 2016-06-13 12:34:40