[英]Distinguish between an empty list or an empty data frame
I'm working with an API that seems to be returning a malformed data. 我正在使用似乎返回格式错误的数据的API。 The API shoud return nested data frames, but also returns empty lists on occasion too:
API应该返回嵌套的数据帧,但有时也会返回空列表:
column_name
<list>
<data.frame [1 × 5]>
<data.frame [0 × 0]>
<data.frame [0 × 0]>
<list [0]>
...
After this step I want to use unnest
to use the data in the nested data frames downstream. 在此步骤之后我想使用
unnest
在嵌套数据帧下游使用该数据。 However, the empty lists stop this from happening. 但是,空列表阻止了这种情况的发生。 What I thought of doing is:
我想到的是:
However, my go-to approaches of testing for empty lists have fallen a bit flat, as a data frame is a list. 但是,我测试空列表的常用方法有些扁平,因为数据框是一个列表。 Currently I'm thinking of on using
identical
or all.equal
in conjunction with dim
for the test. 目前我使用的思维
identical
或all.equal
会同dim
的测试。 Namely if the dimensions of the entry are [1,1], then replace this entry with an empty data frame. 即,如果条目的尺寸为[1,1],则将该条目替换为空的数据框。
(I am wondering what happens in the case where I have a data frame with dimensions [1,1] but actually has data in it too...) (我想知道如果我有一个尺寸为[1,1]的数据框,但实际上也有数据的情况下会发生什么……)
Is this most the most R way of doing this? 这是最R的方式吗? I've seen this behavior from the API elsewhere, so I will need to use this functionality in multiple places.
我已经从其他地方的API中看到了这种行为,因此我需要在多个地方使用此功能。
NB I'm using the tidyverse, if that impacts answers. 注意:如果影响答案,我正在使用tidyverse。
A dataframe is a special list
but the class is dataframe
. 数据框是一个特殊
list
但类是dataframe
。 You can test for the class this way : 您可以通过以下方式测试课程:
class(data.frame()) == "list"
> FALSE
class(list()) == "list"
> TRUE
Here is one option using map
and if
这是使用
map
一种选择, if
library(dplyr)
library(purrr)
ir %>% mutate(data1=map(data, ~if(is.null(dim(.x))) data.frame() else .x)) %>%
unnest(data1)
Data: Providing copy-past reproducible data is always useful 数据:提供复制过去的可复制数据总是有用的
ir <- iris %>% group_by(Species) %>% nest()
ir$data[[2]]<-list()
It's often easier to clean the data as soon as you get it from the API. 从API获取数据后,通常更容易清除数据。 Then everything that follows can rely on safe assumptions.
然后,接下来的所有事情都可以依靠安全的假设。
For this example, create a function that returns a consistently formatted tbl
using the API's response. 对于此示例,使用API的响应创建一个返回格式一致的
tbl
的函数。 Every tbl
will have the same columns, but some of them might be filled with NA
if they weren't in the response. 每个
tbl
将具有相同的列,但是如果其中一些不在响应中,则可能会用NA
填充。
library(tidyr)
library(dplyr)
response_to_df <- function(id = NA_real_,
country = NA_character_,
wealth = NA_real_,
... # Catch extra columns you don't want
) {
tibble(id = id, country = country, wealth = wealth)
}
prepare_response_df <- function(response) {
do.call(response_to_df, response)
}
responses <- list(
tibble(id = 1:2, country = c("US", "DE"), wealth = c(95, 84)),
list(),
tibble(id = 3)
)
tibble(res = responses) %>%
mutate(nicer = lapply(res, prepare_response_df)) %>%
unnest(nicer)
# # A tibble: 4 x 3
# id country wealth
# <dbl> <chr> <dbl>
# 1 1 US 95
# 2 2 DE 84
# 3 NA NA NA
# 4 3 NA NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.