[英]How do I filter out the row with the nested list but not those with the tibble?
[英]How to filter nested tibble if one row contains list()/no nested tibble
当一行不包含嵌套的 tibble 时,我正在努力过滤嵌套的 tibble。
my_df
在列products
中包含一个嵌套的 tibble。 我想过滤嵌套的 tibble,以便它只在其列food
中包含值apple
。
我可以用mutate(products=map(products, ~filter(.x, str_detect(food, "apple")))
来做到这一点。但是,当my_df
中有一行不包含/一个空嵌套 tibble (list())。
我试图通过创建一个辅助列来规避这个问题,该辅助列检查嵌套小标题中的行数,然后仅将搜索应用于 nrow > 0 的那些行。但是,我使用case_when
的方法失败了,我不知道为什么。
如果有任何提示,我将不胜感激。 请注意,我知道我可以将my_df
拆分为两个单独的 df(一个带有 list(),一个带有嵌套的 tibbles),然后再对它们进行row_bind
。 case_when
的方法在我的用例中似乎更方便,我想了解为什么它不起作用。 在代表之下。 非常感谢!
library(tidyverse)
my_df <- structure(list(branch_name = c("basket1", "basket2"), products = list(
structure(list(), class = c(
"tbl_df", "tbl",
"data.frame"
), row.names = integer(0), .Names = character(0)),
structure(list(
food = c(
"apple",
"grape"
),
supplier = c("john", "jack")),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -2L)
)
)), row.names = c(NA, -2L), class = c(
"tbl_df",
"tbl", "data.frame"
))
my_df
#> # A tibble: 2 x 2
#> branch_name products
#> <chr> <list>
#> 1 basket1 <tibble [0 x 0]>
#> 2 basket2 <tibble [2 x 2]>
#Try to filter the nested df 'products', keep only rows where str_detect(food, "apple")==T
#fails
x <- my_df %>%
mutate(products=map(products, ~filter(.x, str_detect(food, "apple"))))
#> Error in `mutate_cols()`:
#> ! Problem with `mutate()` column `products`.
#> i `products = map(products, ~filter(.x, str_detect(food, "apple")))`.
#> x Problem with `filter()` input `..1`.
#> i Input `..1` is `str_detect(food, "apple")`.
#> x object 'food' not found
#> Caused by error in `h()`:
#> ! Problem with `filter()` input `..1`.
#> i Input `..1` is `str_detect(food, "apple")`.
#> x object 'food' not found
#filter works if in no row the nested df is list()
y <- my_df %>%
mutate(products_nrow=map_dbl(products, nrow)) %>%
filter(products_nrow>0) %>%
mutate(products=map(products, ~filter(.x, str_detect(food, "apple"))))
#correct result
y
#> # A tibble: 1 x 3
#> branch_name products products_nrow
#> <chr> <list> <dbl>
#> 1 basket2 <tibble [1 x 2]> 2
y$products
#> [[1]]
#> # A tibble: 1 x 2
#> food supplier
#> <chr> <chr>
#> 1 apple john
#account for nrows of nested df and use case_when; fails
my_df %>%
mutate(products_nrow=map_dbl(products, nrow)) %>%
mutate(products=case_when(
products_nrow>0 ~ map(products, ~filter(.x, str_detect(food, "apple"))),
TRUE ~ products))
#> Error in `mutate_cols()`:
#> ! Problem with `mutate()` column `products`.
#> i `products = case_when(...)`.
#> x Problem with `filter()` input `..1`.
#> i Input `..1` is `str_detect(food, "apple")`.
#> x object 'food' not found
#> Caused by error in `h()`:
#> ! Problem with `filter()` input `..1`.
#> i Input `..1` is `str_detect(food, "apple")`.
#> x object 'food' not found
由reprex package (v2.0.1) 创建于 2022-03-18
您可以使用if
条件来检查数据集中是否有列food
:
library(dplyr)
library(purrr)
library(strings)
my_df %>%
mutate(products = map(products, ~ if ("food" %in% names(.x)) filter(.x, str_detect(food, "apple")) else .x))
#> # A tibble: 2 × 2
#> branch_name products
#> <chr> <list>
#> 1 basket1 <tibble [0 × 0]>
#> 2 basket2 <tibble [1 × 2]>
另一种可能的解决方案:
library(tidyverse)
my_df[["products"]] <-
map(my_df[["products"]], ~ if (nrow(.x) != 0)
{filter(.x, food == "apple")} else {.x})
my_df
#> # A tibble: 2 × 2
#> branch_name products
#> <chr> <list>
#> 1 basket1 <tibble [0 × 0]>
#> 2 basket2 <tibble [1 × 2]>
一个不直接回答你的问题的 hacky 解决方案,但可能最简单的事情就是简单地unnest
(删除空的小标题)并在应用你的过滤器之前再次nest
:
my_df %>%
unnest(products) %>%
nest(products = -branch_name) %>%
mutate(products=map(products, ~filter(.x, str_detect(food, "apple"))))
导致:
# A tibble: 1 × 2
branch_name products
<chr> <list>
1 basket2 <tibble [1 × 2]>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.