So I have two data.tables.
size_categories = data.table(category = c("S", "M", "L"), size_min = c(0, 10, 25),
size_max = c(10, 25, Inf), bin = c("blue", "red", "green"))
products = data.table(object_id = 1:10, size = seq(1, 37, 4))
I want to merge the tables such that each row of the product table is assigned a bin and size category based on its size.
The ham-fisted way I know would be to assign assign a category to each row on products and then merging
products[size >= 0 & size < 10, category := "S"]
products[size >= 10 & size < 25, category := "M"]
products[size >= 25, category := "L"]
merge(products, size_categories)
Of course this is not flexible at all and I would have to rewrite it if size_categories changed.
I am open to using other packages, but would prefer a solution just using data.table.
Thanks!
I would do it with non-equi join:
products[size_categories, `:=`(category = i.category, bin = i.bin),
on = .(size >= size_min, size < size_max)]
# > products
# object_id size category bin
# 1: 1 1 S blue
# 2: 2 5 S blue
# 3: 3 9 S blue
# 4: 4 13 M red
# 5: 5 17 M red
# 6: 6 21 M red
# 7: 7 25 L green
# 8: 8 29 L green
# 9: 9 33 L green
# 10: 10 37 L green
For reference, here's an approach using foverlaps
:
foverlaps(setkey(size_categories, size_min, size_max),
setkey(products[, size2 := size], size, size2))[, size2 := NULL][]
# object_id size category size_min size_max bin
# 1: 1 1 S 0 10 blue
# 2: 2 5 S 0 10 blue
# 3: 3 9 S 0 10 blue
# 4: 4 13 M 10 25 red
# 5: 5 17 M 10 25 red
# 6: 6 21 M 10 25 red
# 7: 7 25 M 10 25 red
# 8: 7 25 L 25 Inf green
# 9: 8 29 L 25 Inf green
# 10: 9 33 L 25 Inf green
# 11: 10 37 L 25 Inf green
It would probably be helpful in cases where your "size_categories" table has more columns that you want included in the final output.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.