简体   繁体   中英

R - data.table assign the name of a column that is the minium of a row as value to new column

Thanks to both of you for suggesting elegant solutions, Both solutions worked for me, but only the melt() and back-join solution worked for a data.table with dates instead of numeric values.

EDIT

I implemented the proposed data.table solution through melting and joining back with the obtained results from Wimpel as his/her solution also works with dates stored in the date columns instead of the intial toy data that was all integer values.

I prefered the readability of Peace Wang's solution though using data.table assignments and IMO it is much clearer syntax than the melt() solution, however (at least for me), it does not work with columns of type date.

Benchmarking both solutions for numeric/integer data, saw the melt() solution as clear winner.


EDIT 2 To replicate the NA-values through conversion that I get if I implement the solution proposed by Peace Wang, see below for the corrected version of the input data.table.

I have sth like this: Image a list of patient records with measurements taken at various dates. The colnames of the date columns would be sth like "2020-12-15" / "2021-01-15" etc.

 ID   Date_1       Date_2      Date_3   
  1   1990-01-01   1990-02-01  1990-03-01      
  2   1990-01-01   1990-02-01  1990-03-01       
  3   1990-01-01   1982-02-01  1990-03-01 

I have determined the mimum value of each row in my data.table dt like this:

dt <- dt[, Min := do.call(pmin, c(.SD, list(na.rm = TRUE))), .SDcols = -(1)]

So far so good. Now I want to add a new col Min_Date stating the corresponding col name (aka date in my example) of the found miniumum value per row to finally get sth lik this:

  ID   Date_1       Date_2      Date_3        Min        Min_Date
  1    1990-01-01   1990-02-01  1990-03-01   1990-01-01  Date_1
  2    1990-01-01   1990-02-01  1990-03-01   1990-01-01  Date_1
  3    1990-01-01   1982-02-01  1990-03-01   1982-02-01  Date_2

I tried variations of:

dt <- dt[, Min_Date := do.call(which.pmin, c(.SD, list(na.rm = TRUE))),
                           .SDcols = (2:4)]

and then trying to do sth with the col index. Don't really know my way around .I yet, but I couldn't make it work when used in sth along these lines:

exclusions.dt[exclusions.dt[, .I[which.min(.SD)], ISSUE_ID, .SDcols = (2:6)]$V1]

Would appreciate any pointer!

The following code can work.

dt <- fread("
             ID   Date_1   Date_2   Date_3
  1    100      200      300
  2    100      500      300
  3    200      150      400
")
dt[, `:=`(Min = do.call(pmin, c(.SD, list(na.rm = TRUE))),
          date_min = colnames(.SD)[apply(.SD, 1, which.min)]
         ), 
   .SDcols = -1]

If you got the NAs warning, maybe you can try to refresh dt firstly.

Here is another data.table approach

#sample data
library( data.table )
DT <- fread("ID   Date_1   Date_2   Date_3   
  1    100      200      300      
  2    100      500      300      
  3    200      150      400    ")

#melt to long format and get rows with minimum value by ID
DT.min <- melt( DT, id.vars = "ID" )[ , .SD[ which.min(value) ], by = ID]
#    ID variable value
# 1:  1   Date_1   100
# 2:  2   Date_1   100
# 3:  3   Date_2   150

#join back to DT
DT[ DT.min, `:=`( Min = i.value, Min_Date = i.variable ), on = .(ID)][]
#    ID Date_1 Date_2 Date_3 Min Min_Date
# 1:  1    100    200    300 100   Date_1
# 2:  2    100    500    300 100   Date_1
# 3:  3    200    150    400 150   Date_2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM