简体   繁体   English

根据某个范围内的值分配 data.table 列值

[英]Assign data.table column values based on a value being in a certain range

So I have two data.tables.所以我有两个data.tables。

    size_categories = data.table(category = c("S", "M", "L"), size_min = c(0, 10, 25), 
                           size_max = c(10, 25, Inf), bin = c("blue", "red", "green"))

    products = data.table(object_id = 1:10, size = seq(1, 37, 4))

I want to merge the tables such that each row of the product table is assigned a bin and size category based on its size.我想合并这些表,以便根据其大小为产品表的每一行分配一个 bin 和 size 类别。

The ham-fisted way I know would be to assign assign a category to each row on products and then merging我知道的笨拙的方法是为产品的每一行分配一个类别,然后合并

products[size >= 0 & size < 10, category := "S"]
products[size >= 10 & size < 25, category := "M"]
products[size >= 25, category := "L"]
merge(products, size_categories)

Of course this is not flexible at all and I would have to rewrite it if size_categories changed.当然,这根本不灵活,如果 size_categories 发生变化,我将不得不重写它。

I am open to using other packages, but would prefer a solution just using data.table.我愿意使用其他软件包,但更喜欢仅使用 data.table 的解决方案。

Thanks!谢谢!

I would do it with non-equi join:我会用非 equi 加入来做到这一点:

products[size_categories, `:=`(category = i.category, bin = i.bin),
    on = .(size >= size_min, size < size_max)]
# > products
#     object_id size category   bin
#  1:         1    1        S  blue
#  2:         2    5        S  blue
#  3:         3    9        S  blue
#  4:         4   13        M   red
#  5:         5   17        M   red
#  6:         6   21        M   red
#  7:         7   25        L green
#  8:         8   29        L green
#  9:         9   33        L green
# 10:        10   37        L green

For reference, here's an approach using foverlaps :作为参考,这是一种使用foverlaps的方法:

foverlaps(setkey(size_categories, size_min, size_max), 
          setkey(products[, size2 := size], size, size2))[, size2 := NULL][]
#     object_id size category size_min size_max   bin
#  1:         1    1        S        0       10  blue
#  2:         2    5        S        0       10  blue
#  3:         3    9        S        0       10  blue
#  4:         4   13        M       10       25   red
#  5:         5   17        M       10       25   red
#  6:         6   21        M       10       25   red
#  7:         7   25        M       10       25   red
#  8:         7   25        L       25      Inf green
#  9:         8   29        L       25      Inf green
# 10:         9   33        L       25      Inf green
# 11:        10   37        L       25      Inf green

It would probably be helpful in cases where your "size_categories" table has more columns that you want included in the final output.如果您的“size_categories”表包含更多您希望包含在最终 output 中的列,这可能会有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据列值更新data.table中的列值 - Updating column values in data.table, based on column values 在r中使用data.table基于组分配值 - Assign Value based on group with data.table in r 从不同的data.table向data.table的列分配值的最“ data.table”方法是什么 - What is the most “data.table” way to assign values to a column of a data.table from a different data.table 如何基于特定列号中的值对data.table进行子集 - How to subset data.table on the basis of a values in a certain column number 根据data.table中另一列的值填充一列 - Filling a column based on the value of another column in data.table 如何用相同维度的另一个数据表的值替换一个数据表中的某个值 - How to replace a certain value in one data.table with values of another data.table of same dimension 根据data.table中一列的值改变多列的值 - Change values of multiple columns based on the value of one column in data.table R data.table 如果超过大数据集的某个阈值,则将列值的其余部分设置为下一列值 - R data.table Setting the remainder of column values to next column value if exceeding a certain threshold for a large data set 如何根据列中的值从data.table中删除列 - How to delete columns from a data.table based on values in column Select 组中的行基于 data.table 中列值的优先级 - Select row in a group based on priority of column values in data.table
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM