如何从另一个data.table重命名R data.table中的级别？

Question

I have two data.tables, dt is a long one with an integer column levels in the range 1...5, and another data.table "labels" containing labels in a simple form like this: 我有两个data.tables， dt是一个长整数列levels在1 ... 5范围内，另一个data.table“标签”包含一个简单形式的标签，如下所示：

labels <- data.table(V1=1:5, V2=c("Very Low", "Low", "Median", "High", "Very High"))
#    V1       V2
# 1:  1       Very Low
# 2:  2       Low
# 3:  3       Median
# 4:  4       High
# 5:  5       Very High

The actual dt is rather large, but for reproducibility a simple one will do (though in real DT levels are not that regular): 实际的dt相当大，但为了重现性，一个简单的dt会做（虽然真正的DT级别不是那么规律）：

dt <- data.table(levels=rep(1:5, times=10))

How I could replace levels column in dt with character labels from labels in one go? 如何在一次性中用labels中的字符标签替换dt level列？

I could do this in manual loop (ugly!), or I could do this by adding another column, like this: 我可以在手动循环（丑陋！）中执行此操作，或者我可以通过添加另一列来执行此操作，如下所示：

dt[, tmp := labels$V2[dt$level] ]

and then dropping column level and renaming tmp . 然后删除列level并重命名tmp 。

Is there a good data.table way to do so? 有一个很好的data.table方法吗？

Answer 1

The easiest approach is joining the data.tables . 最简单的方法是加入data.tables 。 In order to show the effect I added an id column to dt (see below). 为了显示效果，我向dt添加了一个id列（见下文）。 You can join the data.tables as follows: 您可以加入data.tables ，如下所示：

dt[labels, on=c("levels"="V1")][order(id)] # the [order(id)] part is not necessary, but added to show the effect better

which gives (first 7 rows): 给出（前7行）：

    levels id        V2
 1:      1  1  Very Low
 2:      2  2       Low
 3:      3  3    Median
 4:      4  4      High
 5:      5  5 Very High
 6:      1  6  Very Low
 7:      2  7       Low
....

Or probably even better: 或者甚至更好：

dt <- dt[labels, .(id,levels=V2), on=c("levels"="V1")][order(id)]

which gives (first 7 rows): 给出（前7行）：

> dt
    id    levels
 1:  1  Very Low
 2:  2       Low
 3:  3    Median
 4:  4      High
 5:  5 Very High
 6:  6  Very Low
 7:  7       Low
....

Another option is to use the match function with the labels data.table as a lookup table: 另一种选择是使用match函数和labels data.table作为查找表：

dt[, levels := labels$V2[match(levels, labels$V1)]]

which gives: 这使：

> dt
       levels id
 1:  Very Low  1
 2:       Low  2
 3:    Median  3
 4:      High  4
 5: Very High  5
 6:  Very Low  6
 7:       Low  7
....

Used data: 使用数据：

dt <- data.table(levels=rep(1:5, times=10))[,id:=.I]
labels <- data.table(V1=1:5, V2=c("Very Low", "Low", "Median", "High", "Very High"))

Answer 2

Suppose that your datasets are generated like this: 假设您的数据集是这样生成的：

 dt <- data.table(levels=rep(1:5, times=10))
 labels <- data.table(V1=1:5, V2=c("Very Low", "Low", "Median", "High", "Very High"))

Then you can "relabel" the levels of dt using the factor function: 然后你可以使用factor函数“重新标记” dt的级别：

dt[, level := as.character(factor(level, labels = labels$V2))]

If you don't mind level being of type factor , you can skip the as.character and just do: 如果你不介意级别是类型factor ，你可以跳过as.character ，只是做：

dt[, level := factor(level, labels = labels$V2)]

如何从另一个data.table重命名R data.table中的级别？

问题描述

2 个解决方案

解决方案1
5 2015-10-24 07:24:02

解决方案2
3 已采纳 2015-10-24 05:20:43

如何从另一个data.table重命名R data.table中的级别？

问题描述

2 个解决方案

解决方案1 5 2015-10-24 07:24:02

解决方案2 3 已采纳 2015-10-24 05:20:43

解决方案1
5 2015-10-24 07:24:02

解决方案2
3 已采纳 2015-10-24 05:20:43