简体   繁体   中英

Subsetting at the row level, but value must be column name

Imagine a dataframe:

set.seed(1234)
data<-data.frame(id = sample(letters, 26, replace = FALSE), 
                         a = sample(1:10,26,replace=T),
                         b = sample(1:10,26,replace=T), 
                         c = sample(1:10,26,replace=T))

I'd like to retain, for each id , the column name in which the largest value lies.

The result I am looking for is a data frame with dimensions of 26 x 2 with a column for id and column for largest_value_var . The largest_value_var would contain either a , b , or c .

So far, I have been able to extract the variable name with which the max value is associated using this:

apply(data[,-1], 1, function(x) c(names(x))[which.max(x)])

But I can't seem to quite get the result I'd like into a dataframe... Any help is appreciated.

You can do this fairly easily with max.col() . Setting ties.method = "first" (thanks akrun), we will get the first column in the case of a tie. Here's a data table method:

library(data.table)
setDT(data)[, names(.SD)[max.col(.SD, "first")], by = id]

Update: It seems this method would be more efficient when implemented in base R, probably because of the as.matrix() conversion in max.col() . So here's one way to accomplish it in base.

cbind(data[1], largest = names(data)[-1][max.col(data[-1], "first")])

Thanks to Ananda Mahto for pointing out the efficiency difference.

I like @Richard's use of max.col , but the first thing that came to my mind was to actually get the data into a "tidy" form first, after which doing the subsetting you want should be easy:

library(reshape2)
library(data.table)
melt(as.data.table(data), id.vars = "id")[, variable[which.max(value)], by = id]
#     id V1
#  1:  c  b
#  2:  p  a
#  3:  o  c
#  4:  x  b
#  5:  s  a
## SNIP ###
# 21:  g  a
# 22:  f  b
# 23:  t  a
# 24:  y  a
# 25:  w  b
# 26:  v  a
#     id V1

In order to put the result from your apply() call into a data frame, you could do

df <- data.frame(id=data$id,
             largest_value_var=apply(data[,-1], 1, function(x) names(x)[which.max(x)]))

Note that c(names(x)) is the same as names(x) , so I omitted c() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM