将列表（带有嵌套向量）非规范化/强制转换为 R 中的 data.frame

Question

I'm reading a yaml file like我正在阅读一个 yaml 文件，例如

- person_id: 111
  person_name: Russell
  time:
  - 1
  - 2
  - 3
  value:
  - a
  - b
  - c
- person_id: 222
  person_name: Steven
  time:
  - 1
  - 2
  value:
  - d
  - e

that I want to denormalize to:我想非规范化为：

  person_id person_name time value
1       111     Russell    1     a
2       111     Russell    2     b
3       111     Russell    3     c
4       222      Steven    1     d
5       222      Steven    2     e

I have a solution, but I was hoping there is something more concise.我有一个解决方案，但我希望有更简洁的方法。 Here's the nested list:这是嵌套列表：

l <- list(
  list( 
    person_id   = 111L,
    person_name = "Russell", 
    time        = 1:3, 
    value       = letters[1:3]
  ),
  list( 
    person_id   = 222L,
    person_name = "Steven", 
    time        = 1:2, 
    value       = letters[4:5]
  )
)

Regarding possible duplicates, this question is similar to (1) How to denormalize nested list in R?关于可能的重复，这个问题类似于 (1)如何在 R 中非规范化嵌套列表？ , but the structure is different (the round / diff / saldo structure is transposed compared to time / value here), and to (2) Split comma-separated column into separate rows , but time is vector, instead of a comma-separated element like director . ，但结构不同（ round / diff / saldo结构在这里与time / value相比转置），以及（2）将逗号分隔的列拆分为单独的行，但time是向量，而不是逗号分隔的元素喜欢director 。 I'm hoping this different structure helps.我希望这种不同的结构有所帮助。

Answer 1

Reduce(rbind,lapply(l,data.frame))

Answer 2

To compliment the ideas/approaches by @lmo and @submartingale, here's a purrr/tidyverse version that converts each nested listed into a data.frame/tibble (by replicating the parent elements of name & id), then stacks them into a single tibble. 为了补充@lmo和@submartingale的想法/方法，这是一个purrr / tidyverse版本，该版本将列出的每个嵌套转换为data.frame / tibble（通过复制name和id的父元素），然后将它们堆叠为一个tibble 。

l %>% 
  purrr::map_df(tibble::as_tibble)

Thanks guys for proposing something so concise and generalizable. 谢谢你们提出的如此简洁和可概括的建议。

Answer 3

A simple base R method is to use lapply and data.frame to return a list of data.frames and then used do.call with rbind to combine the data.frames into a single data.frame object. 一个简单的基本R方法是使用lapply和data.frame返回data.frame的列表，然后将其与rbind一起使用do.call将data.frame组合为单个data.frame对象。

do.call(rbind, lapply(l, data.frame))

which returns 哪个返回

  person_id person_name time value
1       111     Russell    1     a
2       111     Russell    2     b
3       111     Russell    3     c
4       222      Steven    1     d
5       222      Steven    2     e

Note that person_name and value will be factor vectors, which can be annoying to work with. 请注意，person_name和value将是因子向量，使用时会很烦人。 If desired, you can convert these to character vectors using the stringsAsFactors argument. 如果需要，可以使用stringsAsFactors参数将其转换为字符向量。

do.call(rbind, lapply(l, data.frame, stringsAsFactors=FALSE))

The printed output looks the same, but the underlying data types of these two variables has changed. 打印的输出看起来相同，但是这两个变量的基础数据类型已更改。

Answer 4

This works, but is less than ideal because (a) each vector in the new data.frame needs to be handled and (b) the type of each vector is explicit ( eg , purrr:map_chr vs purrr:map_int ) 这可行，但不理想，因为（a）需要处理新data.frame中的每个向量，并且（b）每个向量的类型都是显式的（例如 purrr:map_chr与purrr:map_int ）

# Step 1: Determine how many time the 'parent' rows need to be replicated.
values_per_person <- l %>% 
  purrr::modify_depth(2, length) %>% 
  purrr::map_int("value")

# Step 2: Pull out the parent rows and replicate the elements to match `time`.
id_replicated <- l %>% 
  purrr::map_int("person_id") %>% 
  rep(times=values_per_person)    
name_replicated <- l %>%
  purrr::map_chr("person_name") %>% 
  rep(times=values_per_person)

# Step 3: Pull out the nested/child rows.
time <- l %>%
  purrr::modify_depth(1, "time") %>% 
  purrr::flatten_int()
value <- l %>%
  purrr::modify_depth(1, "value") %>% 
  purrr::flatten_chr()

# Step 4: Combine the vectors in a data frame.
data.frame(
  person_id   = id_replicated,
  person_name = name_replicated,
  time        = time,
  value       = value
)

Answer 5

( Four years later and I'm still using this once or twice a month. ) The yaml package provides a map handler . （四年后，我仍然每个月使用一两次。 ）yaml 包提供了一个地图处理程序。 In this case, each map/person is converted into a tibble .在这种情况下，每个 map/person 都被转换为tibble 。 Then dplyr::bind_rows() stacks all the tibbles to create a longer, single tibble.然后dplyr::bind_rows()所有小标题堆叠起来以创建一个更长的单个小标题。

path_yaml |> # Replace this line with code below to see a working example.
  yaml::read_yaml(
    handlers = list(map = \(x) tibble::as_tibble(x))
  ) |> 
  dplyr::bind_rows()

Extra details : with this simple dataset, the handler isn't even required -- bind_rows() converts each piece automatically.额外的细节：使用这个简单的数据集，甚至不需要处理程序—— bind_rows()自动转换每个部分。 But I'm skeptical that it will always know how to coerce each map before stacking.但我怀疑它总是知道如何在堆叠之前强制每个地图。 Plus this explicit handler better communicates the intent.此外，这个显式处理程序可以更好地传达意图。

If you want to play with a reproducible example, replace the file path ( ie , the first line) with如果您想使用可重现的示例，请将文件路径（即第一行）替换为

string <- 
"- person_id: 111
  person_name: Russell
  time:
  - 1
  - 2
  - 3
  value:
  - a
  - b
  - c
- person_id: 222
  person_name: Steven
  time:
  - 1
  - 2
  value:
  - d
  - e
"

textConnection(string) |> 
  yaml::read_yaml(...

将列表（带有嵌套向量）非规范化/强制转换为 R 中的 data.frame

问题描述

5 个解决方案

解决方案1
1 2017-11-11 21:03:49

解决方案2
1 2017-11-11 21:15:09

解决方案3
1 已采纳

解决方案4
0 2017-11-11 21:01:28

解决方案5
0 2021-12-29 22:38:13

将列表（带有嵌套向量）非规范化/强制转换为 R 中的 data.frame

问题描述

5 个解决方案

解决方案1 1 2017-11-11 21:03:49

解决方案2 1 2017-11-11 21:15:09

解决方案3 1 已采纳

解决方案4 0 2017-11-11 21:01:28

解决方案5 0 2021-12-29 22:38:13

解决方案1
1 2017-11-11 21:03:49

解决方案2
1 2017-11-11 21:15:09

解决方案3
1 已采纳

解决方案4
0 2017-11-11 21:01:28

解决方案5
0 2021-12-29 22:38:13