简体   繁体   English

如何按行数过滤嵌套的数据帧列表并从 R 的列表中删除过滤的数据帧?

[英]How to filter a nested list of dataframes by row count and remove filtered dataframes from list in R?

This should be a simple problem to solve, but I am unable to get the exact output I would like.这应该是一个简单的问题,但我无法得到我想要的确切 output。 I have a nested list of dataframes, and I would like to filter out all dataframes with less than 50 rows, and remove them from the list.我有一个嵌套的数据框列表,我想过滤掉所有少于 50 行的数据框,并将它们从列表中删除。

Here's a reproducible example of what I have tried -这是我尝试过的一个可重复的示例-

L <- list(iris,mtcars,iris)
O <- list(iris,mtcars,iris)
H <- list(iris,mtcars,iris)
  List <- list(L,O,H)

test <- lapply(List, function(x) lapply(x, function(x) if (nrow(x)<50) NULL else x)))

this works for the first list, but it replaces the mtcars dataframes in the nested lists with NULL - it doesn't remove them from the list.这适用于第一个列表,但它用 NULL 替换嵌套列表中的 mtcars 数据帧 - 它不会将它们从列表中删除。 It doesn't loop through the other lists unfortunately.不幸的是,它不会遍历其他列表。 I have also tried using the filter function我也尝试过使用过滤器 function

test <- lapply(List, function(x) lapply(x, function(x) filter(x, nrow(x)>50)))

This has the same issue with not looping through all lists, and for the first list it leaves me with an empty df which is still an element of the list.这与不循环遍历所有列表具有相同的问题,并且对于第一个列表,它给我留下了一个空的 df ,它仍然是列表的一个元素。 My last solution was writing a for loop which I tried just on the first list in the nest, which mostly worked - but I'd like to find a less chunky way to do this if possible.我的最后一个解决方案是编写一个 for 循环,我只在嵌套中的第一个列表上尝试过,这主要是有效的 - 但如果可能的话,我想找到一种不那么笨重的方法来做到这一点。 This also returns an error: Error in List[[1]][[ii]]: subscript out of bounds这也返回一个错误: List[[1]][[ii]] 中的错误:下标超出范围

for (ii in seq_along(List[[1]])){
n_rows = nrow(List[[1]][[ii]])
    if (n_rows < 20){
        List[[1]][[ii]] = NULL
      }
    }

I am hopeful there is a simple solution just around the corner!我希望有一个简单的解决方案指日可待!

One option could be:一种选择可能是:

lapply(List, function(x) Filter(function(y) nrow(y) >= 50, x))

With purrr library:使用purrr库:

List %>% map(~keep(.x, ~nrow(.x) >= 50))

Here is an option with sapply/lapply这是sapply/lapply的选项

lapply(List, function(x) x[sapply(x, nrow)>=50])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM