简体   繁体   中英

Distinguish between an empty list or an empty data frame

I'm working with an API that seems to be returning a malformed data. The API shoud return nested data frames, but also returns empty lists on occasion too:

column_name
<list>
<data.frame [1 × 5]>                
<data.frame [0 × 0]>                
<data.frame [0 × 0]>                
<list [0]>
...

After this step I want to use unnest to use the data in the nested data frames downstream. However, the empty lists stop this from happening. What I thought of doing is:

  • (1) Test to see if the row entry is an empty list
  • (2) If yes, convert to an empty data frame; if no, leave as is

However, my go-to approaches of testing for empty lists have fallen a bit flat, as a data frame is a list. Currently I'm thinking of on using identical or all.equal in conjunction with dim for the test. Namely if the dimensions of the entry are [1,1], then replace this entry with an empty data frame.

(I am wondering what happens in the case where I have a data frame with dimensions [1,1] but actually has data in it too...)

Is this most the most R way of doing this? I've seen this behavior from the API elsewhere, so I will need to use this functionality in multiple places.

NB I'm using the tidyverse, if that impacts answers.

A dataframe is a special list but the class is dataframe . You can test for the class this way :

class(data.frame()) == "list"
> FALSE
class(list()) == "list"
> TRUE

Here is one option using map and if

library(dplyr)
library(purrr)  
ir %>% mutate(data1=map(data, ~if(is.null(dim(.x))) data.frame() else .x)) %>% 
       unnest(data1)

Data: Providing copy-past reproducible data is always useful

ir <- iris %>% group_by(Species) %>% nest()
ir$data[[2]]<-list()

It's often easier to clean the data as soon as you get it from the API. Then everything that follows can rely on safe assumptions.

For this example, create a function that returns a consistently formatted tbl using the API's response. Every tbl will have the same columns, but some of them might be filled with NA if they weren't in the response.

library(tidyr)
library(dplyr)

response_to_df <- function(id = NA_real_,
                           country = NA_character_,
                           wealth = NA_real_,
                           ... # Catch extra columns you don't want
                           ) {
  tibble(id = id, country = country, wealth = wealth)
}

prepare_response_df <- function(response) {
  do.call(response_to_df, response)
}

responses <- list(
  tibble(id = 1:2, country = c("US", "DE"), wealth = c(95, 84)),
  list(),
  tibble(id = 3)
)

tibble(res = responses) %>%
  mutate(nicer = lapply(res, prepare_response_df)) %>%
  unnest(nicer)
# # A tibble: 4 x 3
#      id country wealth
#   <dbl> <chr>    <dbl>
# 1     1 US          95
# 2     2 DE          84
# 3    NA NA          NA
# 4     3 NA          NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM