简体   繁体   English

如何从另一个data.table重命名R data.table中的级别?

[英]How to rename levels in an R data.table from another data.table?

I have two data.tables, dt is a long one with an integer column levels in the range 1...5, and another data.table "labels" containing labels in a simple form like this: 我有两个data.tables, dt是一个长整数列levels在1 ... 5范围内,另一个data.table“标签”包含一个简单形式的标签,如下所示:

labels <- data.table(V1=1:5, V2=c("Very Low", "Low", "Median", "High", "Very High"))
#    V1       V2
# 1:  1       Very Low
# 2:  2       Low
# 3:  3       Median
# 4:  4       High
# 5:  5       Very High

The actual dt is rather large, but for reproducibility a simple one will do (though in real DT levels are not that regular): 实际的dt相当大,但为了重现性,一个简单的dt会做(虽然真正的DT级别不是那么规律):

dt <- data.table(levels=rep(1:5, times=10))

How I could replace levels column in dt with character labels from labels in one go? 如何在一次性中用labels中的字符标签替换dt level列?

I could do this in manual loop (ugly!), or I could do this by adding another column, like this: 我可以在手动循环(丑陋!)中执行此操作,或者我可以通过添加另一列来执行此操作,如下所示:

dt[, tmp := labels$V2[dt$level] ]

and then dropping column level and renaming tmp . 然后删除列level并重命名tmp

Is there a good data.table way to do so? 有一个很好的data.table方法吗?

The easiest approach is joining the data.tables . 最简单的方法是加入data.tables In order to show the effect I added an id column to dt (see below). 为了显示效果,我向dt添加了一个id列(见下文)。 You can join the data.tables as follows: 您可以加入data.tables ,如下所示:

dt[labels, on=c("levels"="V1")][order(id)] # the [order(id)] part is not necessary, but added to show the effect better

which gives (first 7 rows): 给出(前7行):

    levels id        V2
 1:      1  1  Very Low
 2:      2  2       Low
 3:      3  3    Median
 4:      4  4      High
 5:      5  5 Very High
 6:      1  6  Very Low
 7:      2  7       Low
....

Or probably even better: 或者甚至更好:

dt <- dt[labels, .(id,levels=V2), on=c("levels"="V1")][order(id)]

which gives (first 7 rows): 给出(前7行):

> dt
    id    levels
 1:  1  Very Low
 2:  2       Low
 3:  3    Median
 4:  4      High
 5:  5 Very High
 6:  6  Very Low
 7:  7       Low
....

Another option is to use the match function with the labels data.table as a lookup table: 另一种选择是使用match函数和labels data.table作为查找表:

dt[, levels := labels$V2[match(levels, labels$V1)]]

which gives: 这使:

> dt
       levels id
 1:  Very Low  1
 2:       Low  2
 3:    Median  3
 4:      High  4
 5: Very High  5
 6:  Very Low  6
 7:       Low  7
....

Used data: 使用数据:

dt <- data.table(levels=rep(1:5, times=10))[,id:=.I]
labels <- data.table(V1=1:5, V2=c("Very Low", "Low", "Median", "High", "Very High"))

Suppose that your datasets are generated like this: 假设您的数据集是这样生成的:

 dt <- data.table(levels=rep(1:5, times=10))
 labels <- data.table(V1=1:5, V2=c("Very Low", "Low", "Median", "High", "Very High"))

Then you can "relabel" the levels of dt using the factor function: 然后你可以使用factor函数“重新标记” dt的级别:

dt[, level := as.character(factor(level, labels = labels$V2))]

If you don't mind level being of type factor , you can skip the as.character and just do: 如果你不介意级别是类型factor ,你可以跳过as.character ,只是做:

dt[, level := factor(level, labels = labels$V2)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM