区分空白列表或空白数据框

Question

I'm working with an API that seems to be returning a malformed data. 我正在使用似乎返回格式错误的数据的API。 The API shoud return nested data frames, but also returns empty lists on occasion too: API应该返回嵌套的数据帧，但有时也会返回空列表：

column_name
<list>
<data.frame [1 × 5]>                
<data.frame [0 × 0]>                
<data.frame [0 × 0]>                
<list [0]>
...

After this step I want to use unnest to use the data in the nested data frames downstream. 在此步骤之后我想使用unnest在嵌套数据帧下游使用该数据。 However, the empty lists stop this from happening. 但是，空列表阻止了这种情况的发生。 What I thought of doing is: 我想到的是：

(1) Test to see if the row entry is an empty list （1）测试以查看行条目是否为空列表
(2) If yes, convert to an empty data frame; （2）是，转换为空数据框； if no, leave as is 如果否，请保持原样

However, my go-to approaches of testing for empty lists have fallen a bit flat, as a data frame is a list. 但是，我测试空列表的常用方法有些扁平，因为数据框是一个列表。 Currently I'm thinking of on using identical or all.equal in conjunction with dim for the test. 目前我使用的思维identical或all.equal会同dim的测试。 Namely if the dimensions of the entry are [1,1], then replace this entry with an empty data frame. 即，如果条目的尺寸为[1,1]，则将该条目替换为空的数据框。

(I am wondering what happens in the case where I have a data frame with dimensions [1,1] but actually has data in it too...) （我想知道如果我有一个尺寸为[1,1]的数据框，但实际上也有数据的情况下会发生什么……）

Is this most the most R way of doing this? 这是最R的方式吗？ I've seen this behavior from the API elsewhere, so I will need to use this functionality in multiple places. 我已经从其他地方的API中看到了这种行为，因此我需要在多个地方使用此功能。

NB I'm using the tidyverse, if that impacts answers. 注意：如果影响答案，我正在使用tidyverse。

Answer 1

A dataframe is a special list but the class is dataframe . 数据框是一个特殊list但类是dataframe 。 You can test for the class this way : 您可以通过以下方式测试课程：

class(data.frame()) == "list"
> FALSE
class(list()) == "list"
> TRUE

Answer 2

Here is one option using map and if 这是使用map一种选择， if

library(dplyr)
library(purrr)  
ir %>% mutate(data1=map(data, ~if(is.null(dim(.x))) data.frame() else .x)) %>% 
       unnest(data1)

Data: Providing copy-past reproducible data is always useful 数据：提供复制过去的可复制数据总是有用的

ir <- iris %>% group_by(Species) %>% nest()
ir$data[[2]]<-list()

Answer 3

It's often easier to clean the data as soon as you get it from the API. 从API获取数据后，通常更容易清除数据。 Then everything that follows can rely on safe assumptions. 然后，接下来的所有事情都可以依靠安全的假设。

For this example, create a function that returns a consistently formatted tbl using the API's response. 对于此示例，使用API的响应创建一个返回格式一致的tbl的函数。 Every tbl will have the same columns, but some of them might be filled with NA if they weren't in the response. 每个tbl将具有相同的列，但是如果其中一些不在响应中，则可能会用NA填充。

library(tidyr)
library(dplyr)

response_to_df <- function(id = NA_real_,
                           country = NA_character_,
                           wealth = NA_real_,
                           ... # Catch extra columns you don't want
                           ) {
  tibble(id = id, country = country, wealth = wealth)
}

prepare_response_df <- function(response) {
  do.call(response_to_df, response)
}

responses <- list(
  tibble(id = 1:2, country = c("US", "DE"), wealth = c(95, 84)),
  list(),
  tibble(id = 3)
)

tibble(res = responses) %>%
  mutate(nicer = lapply(res, prepare_response_df)) %>%
  unnest(nicer)
# # A tibble: 4 x 3
#      id country wealth
#   <dbl> <chr>    <dbl>
# 1     1 US          95
# 2     2 DE          84
# 3    NA NA          NA
# 4     3 NA          NA

区分空白列表或空白数据框

问题描述

3 个解决方案

解决方案1
1 2019-05-10 13:07:41

解决方案2
1 2019-05-10 13:12:11

解决方案3
0 已采纳 2019-05-10 13:25:37

区分空白列表或空白数据框

问题描述

3 个解决方案

解决方案1 1 2019-05-10 13:07:41

解决方案2 1 2019-05-10 13:12:11

解决方案3 0 已采纳 2019-05-10 13:25:37

解决方案1
1 2019-05-10 13:07:41

解决方案2
1 2019-05-10 13:12:11

解决方案3
0 已采纳 2019-05-10 13:25:37