Rank values in r datatable grouped by another variable

Question

I would like to use datatable's frank function to rank the date column by id. However, my rankings only seem to take into consideration the date column and not the id corresponding to it. I also receive 6 of these warnings that I'm not sure about:

1..... 6: In [.data.table (dups, , := (rank, frank(dups, date, ties.method = "average")), : RHS 1 is length 10 (greater than the size (1) of group 6). The last 9 element(s) will be discarded.

dups <- data.table (id = c('11', '11', '11', '22','22',
  '88', '99','44','44', '55'),
  date = mdy(c("1-01-2016", "1-02-2016", "1-02-2016","2-01-2016", 
  "2-02-2016")))

so.sample <- dups[, rank := frank(dups, date, ties.method = "average"), by = id]

For example, id = 11 and date = 2016-01-01 should rank 1 instead of 1.5, because there is only one id and date with that combination.

thx for help

Answer 1

It works just fine with both, 'rank' and 'frank'. Maybe your date variable was not formatted correctly. Here is the code:

dt1 <- data.table (id = c('11', '11', '11', '22','22',
                      '88', '99','44','44', '55'),
               date = as.Date(c("01-01-2016", 
                                "01-02-2016", 
                                "01-02-2016",
                                "02-01-2016", 
                                "02-02-2016"),
                              format = "%m-%d-%Y"))
setkey(dt1, date)
setkey(dt1, id)
dt1

    id       date
 1: 11 2016-01-01
 2: 11 2016-01-02
 3: 11 2016-01-02
 4: 22 2016-02-01
 5: 22 2016-02-02
 6: 44 2016-01-02
 7: 44 2016-02-01
 8: 55 2016-02-02
 9: 88 2016-01-01
10: 99 2016-01-02

dt1[, rank := frank(date),
    by = list(id)]
dt1

    id       date  rank
 1: 11 2016-01-01   1.0
 2: 11 2016-01-02   2.5
 3: 11 2016-01-02   2.5
 4: 22 2016-02-01   1.0
 5: 22 2016-02-02   2.0
 6: 44 2016-01-02   1.0
 7: 44 2016-02-01   2.0
 8: 55 2016-02-02   1.0
 9: 88 2016-01-01   1.0
10: 99 2016-01-02   1.0

Additionally, if you just want to enumerate your records, using .N can be quite helpful:

dt1[, Visit := 1:.N,
    by = list(id)]
dt1

    id       date rank Visit
 1: 11 2016-01-01  1.0     1
 2: 11 2016-01-02  2.5     2
 3: 11 2016-01-02  2.5     3
 4: 22 2016-02-01  1.0     1
 5: 22 2016-02-02  2.0     2
 6: 44 2016-01-02  1.0     1
 7: 44 2016-02-01  2.0     2
 8: 55 2016-02-02  1.0     1
 9: 88 2016-01-01  1.0     1
10: 99 2016-01-02  1.0     1

I hope this helps.

Rank values in r datatable grouped by another variable

Question

1 answers

solution1
10 ACCPTED 2016-12-01 21:12:03

Rank values in r datatable grouped by another variable

Question

1 answers

solution1 10 ACCPTED 2016-12-01 21:12:03

solution1
10 ACCPTED 2016-12-01 21:12:03