[英]Stack a named Date list to data.frame
I am doing a school project of data cleaning with tidyverse
package. 我正在用
tidyverse
包进行数据清理的学校项目。 Now I get a list output from purrr::map()
like this: 现在我从
purrr::map()
得到一个列表输出,如下所示:
(mylist <- list(A = as.Date(sample(1e3:1e4, 4), origin = "1960-01-01"),
B = as.Date(sample(1e3:1e4, 2), origin = "1960-01-01"),
C = as.Date(sample(1e3:1e4, 3), origin = "1960-01-01")))
$A
[1] "1970-06-12" "1984-05-28" "1967-06-28" "1982-12-14"
$B
[1] "1966-02-04" "1967-02-21"
$C
[1] "1977-07-19" "1968-03-11" "1964-02-13"
I want to stack them to: 我想将它们堆叠到:
df <- data.frame(Value = reduce(mylist, c))
df$Class <- rep(names(mylist), sapply(mylist, length))
df
Value Class
1 1970-06-12 A
2 1984-05-28 A
3 1967-06-28 A
4 1982-12-14 A
5 1966-02-04 B
6 1967-02-21 B
7 1977-07-19 C
8 1968-03-11 C
9 1964-02-13 C
Date
class actually. Date
类。 stack(mylist)
doesn't work in Date list. stack(mylist)
在日期列表中不起作用。 Are there any methods to achieve it efficiently with functions in tidyverse
or other packages? 是否有任何方法可以通过
tidyverse
或其他包中的功能有效地实现它?
We can use stack
我们可以使用
stack
stack(mylist)
# values ind
#1 1 A
#2 2 A
#3 3 A
#4 4 B
#5 5 B
#6 6 B
#7 7 B
#8 8 C
#9 9 C
For the updated question, either melt
from reshape2
or data.table
can be used 为更新的问题,无论是
melt
从reshape2
或data.table
可以使用
library(reshape2)
melt(mylist)
# value L1
#1 3658-09-23 A
#2 2390-06-01 A
#3 2744-01-09 A
#4 2432-02-21 A
#5 4077-11-13 B
#6 4022-11-13 B
#7 3923-11-19 C
#8 2836-08-20 C
#9 3411-01-23 C
Or using tidyverse
或者使用
tidyverse
library(tidyverse)
enframe(mylist) %>%
unnest
# A tibble: 9 x 2
# name value
# <chr> <date>
#1 A 3658-09-23
#2 A 2390-06-01
#3 A 2744-01-09
#4 A 2432-02-21
#5 B 4077-11-13
#6 B 4022-11-13
#7 C 3923-11-19
#8 C 2836-08-20
#9 C 3411-01-23
Here are some benchmarks using the four solutions posted 以下是使用发布的四种解决方案的一些基准测试
mylist2 <- rep(mylist, 1e4) #slightly bigger list
system.time({data.table::melt(mylist2)}) #Sotos solution
#user system elapsed
# 3.432 0.025 3.436
system.time({reshape2::melt(mylist2)})
#user system elapsed
# 3.461 0.021 3.472
The melt
functions from both packages have only slight difference in performance 两种包装的
melt
功能在性能上只有轻微的差异
system.time({rbindlist(lapply(mylist2, as.data.table), idcol = names(mylist2))})
# user system elapsed
# 2.889 0.160 3.029
system.time({enframe(mylist2) %>%
unnest})
# user system elapsed
# 0.149 0.004 0.152
system.time({data.frame(value = reduce(mylist2, c),
class = rep(names(mylist2), lengths(mylist2)))})
# user system elapsed
# 14.714 8.890 23.550
library(microbenchmark)
microbenchmark(
ak1 = reshape2::melt(mylist2),
ak2 = enframe(mylist2) %>%
unnest,
st1 = data.table::melt(mylist2),
st2 = rbindlist(lapply(mylist2, as.data.table), idcol = names(mylist2)),
unit = 'relative', times = 10L)
#Unit: relative
# expr min lq mean median uq max neval cld
# ak1 26.51333 24.38488 21.52345 23.65577 19.36023 18.71291 10 c
# ak2 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 10 a
# st1 25.34465 24.56934 21.97353 26.27053 19.23078 17.32802 10 c
# st2 16.19268 16.96023 14.28353 16.25288 12.69549 11.58340 10 b
We can use data.table
package which is very efficient, 我们可以使用非常高效的
data.table
包,
library(data.table)
rbindlist(lapply(mylist, as.data.table), idcol = names(mylist))
# A V1
#1: A 4394-02-08
#2: A 4580-05-16
#3: A 2476-01-24
#4: A 2928-11-03
#5: B 4652-12-02
#6: B 3758-02-20
#7: C 2331-09-07
#8: C 3092-02-15
#9: C 3494-03-07
Additionally, data.table::melt()
will also do the job (similar to @akrun's reshape2
solution), ie 另外,
data.table::melt()
也可以完成这项工作(类似于@ akrun的reshape2
解决方案),即
data.table::melt(mylist)
We have already got data.table
and tidyverse
solution (which are highly efficient) but just for completeness sake here is a base R approach 我们已经有了
data.table
和tidyverse
解决方案(效率很高),但为了完整起见,这里是一个基础R方法
data.frame(value = Reduce(c, mylist), class = rep(names(mylist), lengths(mylist)))
# value class
#1 1983-04-14 A
#2 1979-01-15 A
#3 1977-08-22 A
#4 1974-06-12 A
#5 1975-07-10 B
#6 1980-02-08 B
#7 1986-11-29 C
#8 1984-03-31 C
#9 1985-03-24 C
Reduce
can also be replaced with do.call
也可以用
do.call
替换Reduce
data.frame(value = do.call(c, mylist), class = rep(names(mylist), lengths(mylist)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.