简体   繁体   中英

R: Converting lists of different nesting degree into data frames

Introduction

I am trying to convert the output of a Census API call (saved as an .rds file here ) into a R data frame object. For convenience, let's call the object 'x'.

  • Object x is a list where each element is a US county.
  • Each county is also a list.
  • Each element of the county list is a block group.
  • Each block group contains a constant count of elements (let's call it z). If one of the elements is the value "NULL" then the block group is another list. If none of the elements is the value "NULL" then the block group is a character vector.
  • If one of the elements is the value "NULL" then the resulting list will contain an element of class "NULL" if it is the value "NULL". Otherwise, it is class "character".
  • I know the number of counties but none of the lengths of the other variables. Each county may have a different number of block groups but each block group has the same z count of elements regardless of its class.

More precisely

  • Each element of x is a list

      # Both of below return 'list' class(x[i]) class(x[[i]]) 
  • Each element of that list is either...

    • A character vector

       # Returns 'list' class(x[[i]][k]) # Returns 'character' class(x[[i]][[k]]) 
    • A list

       # Returns 'list' class(x[[i]][k]) # Returns 'list' class(x[[i]][[k]]) 

The determinant of whether the element is a list or a character vector is whether the the value "NULL" appears in the row of data. If one of the elements of the row is "NULL" then the element is a list. If none of the elements of the row is "NULL" then the element is a character vector.

  • If the above is a list, each element of the list is either of class "NULL" if the value is NULL or class character if the value is not "NULL"

      # Returns 'list' class(x[[i]][[k]][g]) # Returns "NULL" if "NULL" else "character" class(x[[i]][[k]][[g]]) 

Question

Can anyone propose a method for converting this into a data frame? I am having enormous difficulty with figuring out how to convert the block group elements into an object that I can apply() or loop across.

EDIT: An example of the data

In response to requests for a reproducible example, see the below code. It demonstrates a small version of the data I have (my data contains many counties, black groups, and variables). Notice that the length of each block group vector or list equals the number of variables because the elements of the vector are the values of the block group for that respective variable. My goal is to produce a data frame with column names of var1, var2, var3, var4 while each row represents the values for a block group.

set.seed(5) 

# County 1
bezz <- c("var1","var2","var3","var4")          # variable names
bizz <- as.character(round(rnorm(4),2))         # block group 1.1
buzz <- list("NULL","NULL","2","94389")         # block group 1.2
bozz <- as.character(round(rnorm(4),2))         # block group 1.3
bazz <- list("NULL","NULL","888888888","NULL")  # block group 1.4
foo <- list(bezz, bizz,buzz,bozz,bazz)          # county 1 object

# County 2
fezz <- c("var1","var2","var3","var4")          # variable names
fizz <- list("NULL","2","NULL","94389")         # block group 2.1
fuzz <- as.character(round(rnorm(4),2))         # block group 2.2
fozz <- as.character(round(rnorm(4),2))         # block group 2.3
bar <- list(fezz, fizz,fuzz,fozz)               # county 2 object

# County 3
lezz <- c("var1","var2","var3","var4")          # variable names
luzz <- as.character(round(rnorm(4),2))         # block group 3.1
baz <- list(lezz, luzz)                         # county 3 object

# API output
mydata <- list(foo,bar,baz)                     # all counties in a list 

This solutions requires that all NULL 's be converted to NA 's. Since all data appear numerical, as.numeric() has been used, just remove if not what you want.

This should take a while, maybe there are more efficient ways to go about this. The two loops could be made into one, but for the sake of clarity the NULL to NA loop has been kept separate.

have <- readRDS("~/R/SO/acs0509_block_group_call.Rds")

# replace NULL's with NA's
for(i in seq_along(have)) {
  for(j in seq_along(have[[i]])) {
    for(k in seq_along(have[[i]][[j]])) {
      have[[i]][[j]][[k]] <- ifelse(is.null(have[[i]][[j]][[k]]),NA,have[[i]][[j]][[k]])
    }
  }
}

# initiate "want" data.frame with an arbitrary row
want <- as.data.frame(t(as.numeric(have[[1]][[2]])))
colnames(want) <- have[[1]][[1]]

ins.row <- 1

for(i in 1:length(have)) {
  for(j in 2:(length(have[[i]]))) {
    if(is.list(have[[i]][[j]]))
      want[ins.row,] <- as.numeric(unlist(have[[i]][[j]]))
    else
      want[ins.row,] <- as.numeric(have[[i]][[j]])
    ins.row <- ins.row + 1
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM