I have come across this vignette at https://cran.r-project.org/web/packages/data.table/vignettes/datatable-keys-fast-subset.html#multiple-key-point .
My data looks like this:
ID TYPE MEASURE_1 MEASURE_2
1 A 3 3
1 B 4 4
1 C 5 5
1 Mean 4 4
2 A 10 1
2 B 20 2
2 C 30 3
2 Mean 20 2
When I do this... all works as expected.
setkey(dt, ID, TYPE)
dt[.(unique(ID), "A")] # extract SD of all IDs with Type A
dt[.(unique(ID), "B")] # extract SD of all IDs with Type B
dt[.(unique(ID), "C")] # extract SD of all IDs with Type C
Whenever I try sth like this, where I want to base the keyed subset on multiple values for the second key , I only get the result of the all combinations of unique values in key 1 with only the first value defined in the vector c()
for the second key. So, it only takes the first value defined in the vector and ignores all following values.
# extract SD of all IDs with one of the 3 types A/B/C
dt[.(unique(ID), c("A", "B", "C")]
# previous output is equivalent to
dt[.(unique(ID), "A")] # extract SD of all IDs with Type A
# I want/expect
dt[TYPE %in% c("A", "B", "C")]
What am I missing here or is this sth I cannot do with keyed subsets?
To clarify: As I cannot leave out the key 1 in keyed subsets, the vignette calls for inclusion of the first key with unique(key1)
And defining multiple keys in key 1 works also as expected.
dt[.(c(1, 2), "A")] == dt[ID %in% c(1,2) & TYPE == "A"] # TRUE
In the data.table documention (see help("data.table")
or https://rdatatable.gitlab.io/data.table/reference/data.table.html#arguments ), it is mentioned:
character, list and data.frame input to i is converted into a data.table internally using as.data.table.
So, the classical recycling rule used in R (or in data.frame
) applies. That is, .(unique(ID), c("A", "B", "C"))
, which is equivalent to list(unique(ID), c("A", "B", "C"))
, becomes:
as.data.table(list(unique(ID), c("A", "B", "C")))
and since the length of the longest list element (length of c("A", "B", "C")
) is not a multiple of the shorter one (length of unique(ID)
), you will get an error. If you want each value in unique(ID)
combined with each element in c("A", "B", "C")
, you should use CJ(unique(ID), c("A", "B", "C"))
instead.
So what you should do is dt[CJ(unique(ID), c("A", "B", "C"))]
.
Note that dt[.(unique(ID), "A")]
works correctly because you passed only one element for the second key and this gets recycled to match the length of unique(ID)
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.