Related to: Reshaping data.frame from wide to long format .
I was wondering which is the best method to reshape data from wide to long format.
Aside from personal style taste, or readability of the code. Which is better in term of performance?
Is there another possible reason as to why one way should be preferred?
Example data:
v <- 1:3
names(v) <- paste0("col_", 1:3)
d <- purrr::map_df(v, function(x) runif(5, 0, 1))
d$id <- 1:5
# # A tibble: 5 x 4
# col_1 col_2 col_3 id
# <dbl> <dbl> <dbl> <int>
# 1 0.262 0.755 0.132 1
# 2 0.306 0.0344 0.571 2
# 3 0.143 0.628 0.933 3
# 4 0.401 0.709 0.629 4
# 5 0.353 0.691 0.405 5
wide to long methods and desired output:
library(dplyr)
# tidyr
d %>% tidyr::gather("key", "value", -id) %>% head()
# reshape2
reshape2::melt(d, id.vars=c("id")) %>% head()
# DT
data.table::melt(dt, id.vars=c("id")) %>% head()
# output:
# id variable value
# 1 1 col_1 0.2618043
# 2 2 col_1 0.3059923
# 3 3 col_1 0.1433476
# 4 4 col_1 0.4007300
# 5 5 col_1 0.3531845
# 6 1 col_2 0.7550252
Performance-wise it seems that on bigger examples, reshape2::melt
is the fastest, still to be honest we're talking about milliseconds.
Example with bigger numbers:
# bigger numbers example
v <- 1:100
names(v) <- paste0("col_", 1:100)
d <- purrr::map_df(v, function(x) runif(100000, 0, 1))
d$id <- 1:100000
dt <- as.data.table(d) # for dt
Microbenchmark:
microbenchmark::microbenchmark(
gather = {gather(d, "key", "value", -id)},
melt = {melt(d, id.vars=c("id"))},
dt = {melt(dt, id.vars=c("id"))},
times = 100
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# gather 50.75434 51.61489 64.90385 61.06314 68.10268 201.7082 100
# melt 12.08457 12.58755 19.77046 13.13005 22.29556 162.0733 100
# dt 42.80573 44.50143 50.62359 45.04182 54.15235 187.1778 100
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.