简体   繁体   中英

How to rename levels in an R data.table from another data.table?

I have two data.tables, dt is a long one with an integer column levels in the range 1...5, and another data.table "labels" containing labels in a simple form like this:

labels <- data.table(V1=1:5, V2=c("Very Low", "Low", "Median", "High", "Very High"))
#    V1       V2
# 1:  1       Very Low
# 2:  2       Low
# 3:  3       Median
# 4:  4       High
# 5:  5       Very High

The actual dt is rather large, but for reproducibility a simple one will do (though in real DT levels are not that regular):

dt <- data.table(levels=rep(1:5, times=10))

How I could replace levels column in dt with character labels from labels in one go?

I could do this in manual loop (ugly!), or I could do this by adding another column, like this:

dt[, tmp := labels$V2[dt$level] ]

and then dropping column level and renaming tmp .

Is there a good data.table way to do so?

The easiest approach is joining the data.tables . In order to show the effect I added an id column to dt (see below). You can join the data.tables as follows:

dt[labels, on=c("levels"="V1")][order(id)] # the [order(id)] part is not necessary, but added to show the effect better

which gives (first 7 rows):

    levels id        V2
 1:      1  1  Very Low
 2:      2  2       Low
 3:      3  3    Median
 4:      4  4      High
 5:      5  5 Very High
 6:      1  6  Very Low
 7:      2  7       Low
....

Or probably even better:

dt <- dt[labels, .(id,levels=V2), on=c("levels"="V1")][order(id)]

which gives (first 7 rows):

> dt
    id    levels
 1:  1  Very Low
 2:  2       Low
 3:  3    Median
 4:  4      High
 5:  5 Very High
 6:  6  Very Low
 7:  7       Low
....

Another option is to use the match function with the labels data.table as a lookup table:

dt[, levels := labels$V2[match(levels, labels$V1)]]

which gives:

> dt
       levels id
 1:  Very Low  1
 2:       Low  2
 3:    Median  3
 4:      High  4
 5: Very High  5
 6:  Very Low  6
 7:       Low  7
....

Used data:

dt <- data.table(levels=rep(1:5, times=10))[,id:=.I]
labels <- data.table(V1=1:5, V2=c("Very Low", "Low", "Median", "High", "Very High"))

Suppose that your datasets are generated like this:

 dt <- data.table(levels=rep(1:5, times=10))
 labels <- data.table(V1=1:5, V2=c("Very Low", "Low", "Median", "High", "Very High"))

Then you can "relabel" the levels of dt using the factor function:

dt[, level := as.character(factor(level, labels = labels$V2))]

If you don't mind level being of type factor , you can skip the as.character and just do:

dt[, level := factor(level, labels = labels$V2)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM