简体   繁体   中英

Using tidyverse to “unnest” a data.frame column inside a tibble

I'm working with some data which is returned from a www call which jsonlite and as_tibble somehow convert into a data.frame column.

This result data has an Id integer column and an ActionCode data.frame column with two internal columns. these show in the console as:

> result
# A tibble: 117 x 2
  Id    ActionCode$Code $Name 
  <int> <chr>           <chr>
  1     A1              First Code
  2     A2              Second Code
  3     A3              Third Code
  4     A4              Fourth Code
  ...

and this can be inspected with str() as:

> result %>% str()
tibble [117 x 2] (S3: tbl_df/tbl/data.frame)
 $ Id : int [1:117] 1 2 3 4 ...
 $ ActionCode:'data.frame': 117 obs. of  2 variables:
  ..$ Code: chr [1:117] "A1" "A2" "A3" "A4" ...
  ..$ Name: chr [1:117] "First Code" "Second Code" "Third Code" "Fourth Code" ...

I've seen from eg https://tibble.tidyverse.org/articles/types.html that this sort of data.frame column is perfectly legal, but I'm struggling to work out how to access the data in this column from tidy dplyr pipelines - eg I can't select(ActionCode$Code)

Is there a way to work with these columns in dplyr pipelines? Or is there a way to somehow flatten these columns similar to how unnest can be used on list columns (although I realise here that I'm not creating extra rows - I'm just flattening the column hierarchy).

ie I'm trying to find a function foo which can output:

> result %>% foo() %>% str()
tibble [117 x 2] (S3: tbl_df/tbl/data.frame)
 $ Id : int [1:117] 1 2 3 4 ...
 $ Code: chr [1:117] "A1" "A2" "A3" "A4" ...
 $ Name: chr [1:117] "First Code" "Second Code" "Third Code" "Fourth Code" ...

I can't provide the www call as a sample, but as a working example I think the sort of data I am presented with is something like:

sample_data <- tibble(
  Id = 1:10,
  ActionCode = tibble(
    Code = paste0("Id", 1:10),
    Name = paste0("Name ", 1:10),
  )
)

Reconverting to data.frame with do.call flattens out the columns

library(dplyr)
library(stringr)
do.call(data.frame, sample_data) %>% 
    rename_at(vars(starts_with('ActionCode')), ~ 
        str_remove(., 'ActionCode\\.')) %>% 
    as_tibble

-output

# A tibble: 10 x 3
#      Id Code  Name   
#   <int> <chr> <chr>  
# 1     1 Id1   Name 1 
# 2     2 Id2   Name 2 
# 3     3 Id3   Name 3 
# 4     4 Id4   Name 4 
# 5     5 Id5   Name 5 
# 6     6 Id6   Name 6 
# 7     7 Id7   Name 7 
# 8     8 Id8   Name 8 
# 9     9 Id9   Name 9 
#10    10 Id10  Name 10

The other solution works, but I still wanted to point out that data.table handles these kinds of situations well automatically:

library(tibble)

sample_data <- tibble(
  Id = 1:10,
  ActionCode = tibble(
    Code = paste0("Id", 1:10),
    Name = paste0("Name ", 1:10),
  )
)

library(data.table)
as.data.table(sample_data)
#>     Id ActionCode.Code ActionCode.Name
#>  1:  1             Id1          Name 1
#>  2:  2             Id2          Name 2
#>  3:  3             Id3          Name 3
#>  4:  4             Id4          Name 4
#>  5:  5             Id5          Name 5
#>  6:  6             Id6          Name 6
#>  7:  7             Id7          Name 7
#>  8:  8             Id8          Name 8
#>  9:  9             Id9          Name 9
#> 10: 10            Id10         Name 10

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM