简体   繁体   中英

Subsetting a list of data frames by condition

Sorry I can't embed pictures yet

I have 21 data frames in a list ( listb ), all with the same headings of Timestamp , Rainfall I would like to sort them by Rainfall (descending) and then subset the top 30 (to include the corresponding Timestamp ) of each of the 21 data frames. Then put them back into a single dataframe with the name of the initial data frame as a heading?

Please find the list of data frames below, and a small cut from the b1 dataframe

Would I need to create a new dataframe for each of the new subsets then combine them into a list later?

Descending_b1 <- listb$b1[order(-Rainfall),]

b1_30 <- Descending_b1[1:30,1:2]

From that, I produce the following

b1_30 <- structure(list(Timestamp = c("25/1/2013", "24/1/2013", "2/2/2004", 
                             "21/3/2010", "16/7/2016", "1/2/2010", "26/1/2007", "29/12/1998", 
                             "24/2/2008", "5/2/2003", "6/2/2003", "11/11/2001", "3/12/2010", 
                             "8/3/2020", "27/12/2010", "29/1/1998", "18/10/2017", "13/3/2007", 
                             "5/4/2006", "10/6/2006", "19/11/2008", "20/2/2015", "26/3/2014", 
                             "15/3/2017", "27/8/2011", "1/3/2013", "27/8/1998", "11/2/2012", 
                             "11/2/2008", "26/1/2013"), 
                        Rainfall = c(238L, 158L, 131L, 131L,129L, 122L, 112L, 109L, 101L, 94L, 
                                     92L, 88L, 82L, 81L, 78L, 74L, 71L, 69L, 65L, 64L, 64L, 
                                     64L, 63L, 63L, 62L, 61L, 60L, 60L, 58L,57L)), 
                   row.names = c(5915L, 5914L, 2640L, 4874L, 7183L, 4826L, 3725L, 939L, 4118L, 2278L, 2279L, 1827L, 5131L, 8514L, 5155L, 
                              605L, 7642L, 3771L, 3429L, 3495L, 4387L, 6671L, 6340L, 7425L, 
                              5398L, 5950L, 815L, 5566L, 4105L, 5916L), class = "data.frame")
b1_30
#>       Timestamp Rainfall
#> 5915  25/1/2013      238
#> 5914  24/1/2013      158
#> 2640   2/2/2004      131
#> 4874  21/3/2010      131
#> 7183  16/7/2016      129
#> 4826   1/2/2010      122
#> 3725  26/1/2007      112
#> 939  29/12/1998      109
#> 4118  24/2/2008      101
#> 2278   5/2/2003       94
#> 2279   6/2/2003       92
#> 1827 11/11/2001       88
#> 5131  3/12/2010       82
#> 8514   8/3/2020       81
#> 5155 27/12/2010       78
#> 605   29/1/1998       74
#> 7642 18/10/2017       71
#> 3771  13/3/2007       69
#> 3429   5/4/2006       65
#> 3495  10/6/2006       64
#> 4387 19/11/2008       64
#> 6671  20/2/2015       64
#> 6340  26/3/2014       63
#> 7425  15/3/2017       63
#> 5398  27/8/2011       62
#> 5950   1/3/2013       61
#> 815   27/8/1998       60
#> 5566  11/2/2012       60
#> 4105  11/2/2008       58
#> 5916  26/1/2013       57

So yeah I hope to do that with the rest of the data frames within the list to create a new data frame whilst keeping the initial data frame name, and then combine them into a new list

Suppose you have a list like this

set.seed(2021)
listb <- list(b1 = data.frame(Timestamp = as.Date("2010-01-01") + days(sample(1:100, 10)),
              Rainfall = sample(200:300, 10)),
b2 = data.frame(Timestamp = as.Date("2010-01-01") + days(sample(1:100, 10)),
                Rainfall = sample(200:300, 10)),
b3 = data.frame(Timestamp = as.Date("2010-01-01") + days(sample(1:100, 10)),
                Rainfall = sample(200:300, 10)))

> listb
$b1
    Timestamp Rainfall
1  2010-01-08      275
2  2010-02-08      250
3  2010-02-16      259
4  2010-02-28      217
5  2010-01-13      298
6  2010-03-12      202
7  2010-03-06      245
8  2010-04-10      225
9  2010-03-11      235
10 2010-01-24      285

$b2
    Timestamp Rainfall
1  2010-02-01      242
2  2010-04-09      258
3  2010-01-20      269
4  2010-03-10      285
5  2010-03-28      298
6  2010-01-06      262
7  2010-03-15      278
8  2010-03-05      233
9  2010-02-08      221
10 2010-01-19      215

$b3
    Timestamp Rainfall
1  2010-03-21      216
2  2010-03-30      240
3  2010-01-18      230
4  2010-01-21      272
5  2010-03-10      292
6  2010-04-05      226
7  2010-03-14      210
8  2010-03-25      235
9  2010-03-09      237
10 2010-01-03      278

Now you need to do this only (Needless to say replace n argument in slice_max with your desired n=30)

purrr::map2_dfr(listb, names(listb), ~ .x %>% 
          mutate(list_name = .y) %>%
           slice_max(Rainfall, n=5))

    Timestamp Rainfall list_name
1  2010-01-13      298        b1
2  2010-01-24      285        b1
3  2010-01-08      275        b1
4  2010-02-16      259        b1
5  2010-02-08      250        b1
6  2010-03-28      298        b2
7  2010-03-10      285        b2
8  2010-03-15      278        b2
9  2010-01-20      269        b2
10 2010-01-06      262        b2
11 2010-03-10      292        b3
12 2010-01-03      278        b3
13 2010-01-21      272        b3
14 2010-03-30      240        b3
15 2010-03-09      237        b3

If you want to return the output back into a similar list

purrr::map(listb,  ~ .x %>% 
               slice_max(Rainfall, n=5))

$b1
   Timestamp Rainfall
1 2010-01-13      298
2 2010-01-24      285
3 2010-01-08      275
4 2010-02-16      259
5 2010-02-08      250

$b2
   Timestamp Rainfall
1 2010-03-28      298
2 2010-03-10      285
3 2010-03-15      278
4 2010-01-20      269
5 2010-01-06      262

$b3
   Timestamp Rainfall
1 2010-03-10      292
2 2010-01-03      278
3 2010-01-21      272
4 2010-03-30      240
5 2010-03-09      237

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM