简体   繁体   English

区分空白列表或空白数据框

[英]Distinguish between an empty list or an empty data frame

I'm working with an API that seems to be returning a malformed data. 我正在使用似乎返回格式错误的数据的API。 The API shoud return nested data frames, but also returns empty lists on occasion too: API应该返回嵌套的数据帧,但有时也会返回空列表:

column_name
<list>
<data.frame [1 × 5]>                
<data.frame [0 × 0]>                
<data.frame [0 × 0]>                
<list [0]>
...

After this step I want to use unnest to use the data in the nested data frames downstream. 在此步骤之后我想使用unnest在嵌套数据帧下游使用该数据。 However, the empty lists stop this from happening. 但是,空列表阻止了这种情况的发生。 What I thought of doing is: 我想到的是:

  • (1) Test to see if the row entry is an empty list (1)测试以查看行条目是否为空列表
  • (2) If yes, convert to an empty data frame; (2)是,转换为空数据框; if no, leave as is 如果否,请保持原样

However, my go-to approaches of testing for empty lists have fallen a bit flat, as a data frame is a list. 但是,我测试空列表的常用方法有些扁平,因为数据框是一个列表。 Currently I'm thinking of on using identical or all.equal in conjunction with dim for the test. 目前我使用的思维identicalall.equal会同dim的测试。 Namely if the dimensions of the entry are [1,1], then replace this entry with an empty data frame. 即,如果条目的尺寸为[1,1],则将该条目替换为空的数据框。

(I am wondering what happens in the case where I have a data frame with dimensions [1,1] but actually has data in it too...) (我想知道如果我有一个尺寸为[1,1]的数据框,但实际上也有数据的情况下会发生什么……)

Is this most the most R way of doing this? 这是最R的方式吗? I've seen this behavior from the API elsewhere, so I will need to use this functionality in multiple places. 我已经从其他地方的API中看到了这种行为,因此我需要在多个地方使用此功能。

NB I'm using the tidyverse, if that impacts answers. 注意:如果影响答案,我正在使用tidyverse。

A dataframe is a special list but the class is dataframe . 数据框是一个特殊list但类是dataframe You can test for the class this way : 您可以通过以下方式测试课程:

class(data.frame()) == "list"
> FALSE
class(list()) == "list"
> TRUE

Here is one option using map and if 这是使用map一种选择, if

library(dplyr)
library(purrr)  
ir %>% mutate(data1=map(data, ~if(is.null(dim(.x))) data.frame() else .x)) %>% 
       unnest(data1)

Data: Providing copy-past reproducible data is always useful 数据:提供复制过去的可复制数据总是有用的

ir <- iris %>% group_by(Species) %>% nest()
ir$data[[2]]<-list()

It's often easier to clean the data as soon as you get it from the API. 从API获取数据后,通常更容易清除数据。 Then everything that follows can rely on safe assumptions. 然后,接下来的所有事情都可以依靠安全的假设。

For this example, create a function that returns a consistently formatted tbl using the API's response. 对于此示例,使用API​​的响应创建一个返回格式一致的tbl的函数。 Every tbl will have the same columns, but some of them might be filled with NA if they weren't in the response. 每个tbl将具有相同的列,但是如果其中一些不在响应中,则可能会用NA填充。

library(tidyr)
library(dplyr)

response_to_df <- function(id = NA_real_,
                           country = NA_character_,
                           wealth = NA_real_,
                           ... # Catch extra columns you don't want
                           ) {
  tibble(id = id, country = country, wealth = wealth)
}

prepare_response_df <- function(response) {
  do.call(response_to_df, response)
}

responses <- list(
  tibble(id = 1:2, country = c("US", "DE"), wealth = c(95, 84)),
  list(),
  tibble(id = 3)
)

tibble(res = responses) %>%
  mutate(nicer = lapply(res, prepare_response_df)) %>%
  unnest(nicer)
# # A tibble: 4 x 3
#      id country wealth
#   <dbl> <chr>    <dbl>
# 1     1 US          95
# 2     2 DE          84
# 3    NA NA          NA
# 4     3 NA          NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM