简体   繁体   中英

How to get names of certain variables in a column in R?

I have a data frame that looks like:

    ID   CO1   CO2   ED1   ED2   max
    1     1     2     1     3     3
    2     1     3     3     2     3 
    3     4     2     2     1     3
    4     3     3     4     4     4
    ...
    10    1     1      1     1    1

How do I get R to give me the name(s) of the columns that contain a particular number contanined in the colum max and assign them to a new column, named “best”?

I want something like this:

    ID     CO1   CO2    ED1   ED2    max     best
    1       1     2      1     3      3       ED2         
    2       1     3      3     2      3       CO2
    3       4     2      2     1      4       CO1
    4       3     3      4     4      4       ED1
    ...
    10      1     1      1     1      1       CO2

In case there are more values equal to the one contained in the max column (as for example in row 2 or row 10), one at random is fine.

I have seen several solution to problems similar to this one, but none that effectively works in my case.

You can use max.col :

cols <- grep('CO|ED', names(df), value = TRUE)
df$best <- cols[max.col(df[cols] == df$max)]
df

#  ID CO1 CO2 ED1 ED2 max best
#1  1   1   2   1   3   3  ED2
#2  2   1   3   3   2   3  CO2
#3  3   4   2   2   1   4  CO1
#4  4   3   3   4   4   4  ED1
#5 10   1   1   1   1   1  ED2

You can check ties.method in ?max.col to get first/last match in each row.

data

df <- structure(list(ID = c(1L, 2L, 3L, 4L, 10L), CO1 = c(1L, 1L, 4L, 
3L, 1L), CO2 = c(2L, 3L, 2L, 3L, 1L), ED1 = c(1L, 3L, 2L, 4L, 
1L), ED2 = c(3L, 2L, 1L, 4L, 1L), max = c(3L, 3L, 4L, 4L, 1L)), 
row.names = c(NA, -5L), class = "data.frame")

No need to be overly fancy:


d <- read.table(text=
"    ID   CO1   CO2   ED1   ED2   max
    1     1     2     1     3     3
    2     1     3     3     2     3
    3     4     2     2     1     3
    4     3     3     4     4     4
    10    1     1      1     1    1
", header=TRUE )

max.columns <- d %>% select(matches("CO|ED")) %>%
    apply( 1, which.max )

d$best <- colnames(d)[ max.columns+1 ]

d

Outputs:


> d
  ID CO1 CO2 ED1 ED2 max best
1  1   1   2   1   3   3  ED2
2  2   1   3   3   2   3  CO2
3  3   4   2   2   1   3  CO1
4  4   3   3   4   4   4  ED1
5 10   1   1   1   1   1  CO1

Long Base R solution with "best" vector containing the names of all of the best vectors:

# Store as a variable the names of the raw data vectors:
# dvecs => character vector
dvecs <- setdiff(names(df), c("ID", "max"))

# Store a matrix of booleans denoting if the column contains the max value:
# bool_test => logical matrix
bool_test <- df$max == df[,dvecs]

# Store a vector containing the names of the columns with the max values:
# best => character vector
df$best <- apply(
  data.frame(
    vapply(
      seq_along(dvecs),
      function(i) {
        ifelse(bool_test[, i], dvecs[i], NA_character_)
      },
      character(nrow(bool_test))
    )
  ), 
  1, 
  function(x) {
    paste0(na.omit(x), collapse = ", ")
  }
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM