简体   繁体   中英

extract identically named vectors from nested lists, where the list names vary? Using purrr?

I have to work with some data that is in recursive lists like this (simplified reproducible example below):

groups
#> $group1
#> $group1$countries
#> [1] "USA" "JPN"
#> 
#> 
#> $group2
#> $group2$countries
#> [1] "AUS" "GBR"

Code for data input below:

chars <- c("USA", "JPN")
chars2 <- c("AUS", "GBR")

group1 <- list(countries = chars)
group2 <- list(countries = chars2)

groups <- list(group1 = group1, group2 = group2)
groups

I'm trying to work out how to extract the vectors that are in the lists, without manually having to write a line of code for each group. The code below works, but my example has a large number of groups (and the number of groups will change), so it would be great to work out how to extract all of the vectors in a more efficient manner. This is the brute force way, that works:

countries1 <- groups$group1$countries
countries2 <- groups$group2$countries

In the example, the bottom level vector I'm trying to extract is always called countries , but the lists they're contained in change name, varying only by numbering.

Would there be an easy purrr solution? Or tidyverse solution? Or other solution?

Add some additional cases to your list

groups[["group3"]] <- list()
groups[["group4"]] <- list(foo = letters[1:2])
groups[["group5"]] <- list(foo = letters[1:2], countries = LETTERS[1:2])

Here's a function that maps any list to just the elements named "countries"; it returns NULL if there are no elements

fun = function(x)
    x[["countries"]]

Map your original list to contain just the elements you're interested in

interesting <- Map(fun, groups)

Then transform these into a data.frame using a combination of unlist() and rep()

df <- data.frame(
    country = unlist(interesting, use.names = FALSE),
    name = rep(names(interesting), lengths(interesting))
)

Alternatively, use tidy syntax, eg,

interesting %>% 
    tibble(group = names(.), value = .) %>% 
    unnest("value")

The output is

# A tibble: 6 x 2
  group  value
  <chr>  <chr>
1 group1 USA
2 group1 JPN
3 group2 AUS
4 group2 GBR
5 group5 A
6 group5 B

If there are additional problems parsing individual elements of groups , then modify fun , eg,

fun = function(x)
    as.character(x[["countries"]])

This will put the output in a list which will handle any number of groups

countries <- unlist(groups, recursive = FALSE)
names(countries) <- sub("^\\w+(\\d+)\\.(\\w+)", "\\2\\1", names(countries), perl = TRUE)

> countries
$countries1
[1] "USA" "JPN"

$countries2
[1] "AUS" "GBR"

You can simply transform your nested list to a data.frame and then unnest the country column.

library(dplyr)
library(tidyr)
groups %>% 
  tibble(group = names(groups),
         country = .) %>% 
  unnest(country) %>% 
  unnest(country)
#> # A tibble: 4 x 2
#>   group  country
#>   <chr>  <chr>  
#> 1 group1 USA    
#> 2 group1 JPN    
#> 3 group2 AUS    
#> 4 group2 GBR

Created on 2020-01-15 by the reprex package (v0.3.0)

Since the countries are hidden 2 layers deep, you have to run unnest twice. Otherwise I think this is straightforward.

If you actually want to have each vector as a an object in you global environment a combination of purrr::map2/walk and list2env will work. In order to make this work, we have to give the country entries in the list individual names first, otherwise list2env just overwrites the same object over and over again.

library(purrr)
groups <- 
  map2(groups, 1:length(groups), ~setNames(.x, paste0(names(.x), .y)))
walk(groups, ~list2env(. , envir = .GlobalEnv))

This would create the exact same results you are describing in your question. I am not sure though, if it is the best solution for a smooth workflow, since I don't know where you are going with this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM