[英]R - Nested list to tibble
我有一個嵌套列表,如下所示:
> ex <- list(list(c("This", "is", "an", "example", "."), c("I", "really", "hate", "examples", ".")), list(c("How", "do", "you", "feel", "about", "examples", "?")))
> ex
[[1]]
[[1]][[1]]
[1] "This" "is" "an" "example" "."
[[1]][[2]]
[1] "I" "really" "hate" "examples" "."
[[2]]
[[2]][[1]]
[1] "How" "do" "you" "feel" "about" "examples" "?"
我想把它轉換成像這樣的元素:
> tibble(d_id = as.integer(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2)),
+ s_id = as.integer(c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1)),
+ t_id = as.integer(c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7)),
+ token = c("This", "is", "an", "example", ".", "I", "really",
+ "hate", "examples", ".", "How", "do", "you", "feel", "about", "examples", "?"))
# A tibble: 17 x 4
d_id s_id t_id token
<int> <int> <int> <chr>
1 1 1 1 This
2 1 1 2 is
3 1 1 3 an
4 1 1 4 example
5 1 1 5 .
6 1 2 1 I
7 1 2 2 really
8 1 2 3 hate
9 1 2 4 examples
10 1 2 5 .
11 2 1 1 How
12 2 1 2 do
13 2 1 3 you
14 2 1 4 feel
15 2 1 5 about
16 2 1 6 examples
17 2 1 7 ?
我執行此操作的最有效方法是什么? 最好使用tidyverse
功能?
我們可以做的
ex %>%
set_names(seq_along(ex)) %>%
map( ~ set_names(.x, seq_along(.x)) %>%
stack) %>%
bind_rows(.id = 'd_id') %>%
group_by(d_id, s_id = ind) %>%
mutate(t_id = row_number()) %>%
select(d_id, s_id, t_id, token = values)
# A tibble: 17 x 4
# Groups: d_id, s_id [3]
# d_id s_id t_id token
# <chr> <chr> <int> <chr>
# 1 1 1 1 This
# 2 1 1 2 is
# 3 1 1 3 an
# 4 1 1 4 example
# 5 1 1 5 .
# 6 1 2 1 I
# 7 1 2 2 really
# 8 1 2 3 hate
# 9 1 2 4 examples
#10 1 2 5 .
#11 2 1 1 How
#12 2 1 2 do
#13 2 1 3 you
#14 2 1 4 feel
#15 2 1 5 about
#16 2 1 6 examples
#17 2 1 7 ?
是時候讓一些序列工作了,這應該是非常有效的:
d_id <- rep(seq_along(ex), lengths(ex))
s_id <- sequence(lengths(ex))
t_id <- lengths(unlist(ex, rec=FALSE))
data.frame(
d_id = rep(d_id, t_id),
s_id = rep(s_id, t_id),
t_id = sequence(t_id),
token = unlist(ex)
)
# d_id s_id t_id token
#1 1 1 1 This
#2 1 1 2 is
#3 1 1 3 an
#4 1 1 4 example
#5 1 1 5 .
#6 1 2 1 I
#7 1 2 2 really
#8 1 2 3 hate
#9 1 2 4 examples
#10 1 2 5 .
#11 2 1 1 How
#12 2 1 2 do
#13 2 1 3 you
#14 2 1 4 feel
#15 2 1 5 about
#16 2 1 6 examples
#17 2 1 7 ?
對於ex
列表的500K樣本,這將在大約2秒內運行。 我懷疑在效率方面很難被擊敗。
您可以使用melt
從reshape2包:
library(data.table)
setDT(melt(ex))[, .(d_id = L1, s_id = L2, t_id = rowid(L1, L2), token = value)]
d_id s_id t_id token
1: 1 1 1 This
2: 1 1 2 is
3: 1 1 3 an
4: 1 1 4 example
5: 1 1 5 .
6: 1 2 1 I
7: 1 2 2 really
8: 1 2 3 hate
9: 1 2 4 examples
10: 1 2 5 .
11: 2 1 1 How
12: 2 1 2 do
13: 2 1 3 you
14: 2 1 4 feel
15: 2 1 5 about
16: 2 1 6 examples
17: 2 1 7 ?
我在這里用data.table顯示它,因為我知道如何從那里一步完成列選擇和重命名(盡管dplyr應該沒有問題)。 melt.list
函數來自reshape2。
另一個tidyverse
解決方案:
library(tidyverse)
ex %>%
modify_depth(-1,~tibble(token=.x) %>% rowid_to_column("t_id")) %>%
map(~map_dfr(.x,identity,.id = "s_id")) %>%
map_dfr(identity,.id = "d_id")
# # A tibble: 17 x 4
# d_id s_id t_id token
# <chr> <chr> <int> <chr>
# 1 1 1 1 This
# 2 1 1 2 is
# 3 1 1 3 an
# 4 1 1 4 example
# 5 1 1 5 .
# 6 1 2 1 I
# 7 1 2 2 really
# 8 1 2 3 hate
# 9 1 2 4 examples
# 10 1 2 5 .
# 11 2 1 1 How
# 12 2 1 2 do
# 13 2 1 3 you
# 14 2 1 4 feel
# 15 2 1 5 about
# 16 2 1 6 examples
# 17 2 1 7 ?
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.