简体   繁体   中英

Subsetting dates in a nested list with dates in data frame in R

I have a nested list with dates and river streamflow data (Flow) in different river reaches (910, 950, 1012, 1087):

Flowtest <- list("910" = tibble(date=c("2017/08/01","2017/08/02","2017/08/03","2017/08/04",
                                             "2017/08/05","2017/08/06","2017/08/07"),
                                Flow=c(123, 170, 187, 245, 679, 870, 820)),
                 "950" = tibble(date=c("2017/08/01","2017/08/02","2017/08/03","2017/08/04",
                                             "2017/08/05","2017/08/06","2017/08/07"),
                                Flow=c(570, 450, 780, 650, 230, 470, 340)),
                 "1012" = tibble(date=c("2017/08/01","2017/08/02","2017/08/03","2017/08/04",
                                              "2017/08/05","2017/08/06","2017/08/07"),
                                 Flow=c(160, 170, 670, 780, 350, 840, 850)),
                 "1087" = tibble(date=c("2017/08/01","2017/08/02","2017/08/03","2017/08/04",
                                              "2017/08/05","2017/08/06","2017/08/07"),
                                 Flow=c(120, 780, 820, 580, 870, 870, 840)))

Flowtest1 <- lapply (Flowtest, function (x) {list (date1 = as.Date(x$date),
                                              Flow = x$Flow) })

I only need a number of days included in a data frame below:

dates_FF <- as.Date("2017/08/05","2017/08/06")

I want to filter and leave in the FLowtest1 just the dates included in dates_FF and Flow occurring on those days. I tried this:

Result_FF <- lapply(Flowtest1, function(x) {
  x[x$date %in% dates_FF, ]  })

which results in the following error:

Error in x[x$date %in% dates_FF, ]: incorrect number of dimensions

I want to achieve something like this:

Result <- list("910" = tibble(date=c("2017/08/05","2017/08/06"),
                                Flow=c(679, 870)),
                 "950" = tibble(date=c("2017/08/05","2017/08/06"),
                                Flow=c( 230, 470)),
                 "1012" = tibble(date=c("2017/08/05","2017/08/06"),
                                 Flow=c(350, 840)),
                 "1087" = tibble(date=c("2017/08/05","2017/08/06"),
                                 Flow=c(870, 870)))

What does the error mean and how to fix this?

Flowtest1 isn't a dataframe, just a list of two vectors, so you can't use bracket notation to subset both vectors at once. Also, dates_FF is created with two arguments instead of a vector, so it doesn't work. as.Date is vectorized, so if you just make the two input dates a vector it will work. This returns what you want with no need for Flowtest1

dates_FF <- as.Date(c("2017/08/05","2017/08/06"))
Result_FF <- lapply(Flowtest, function(x) x[as.Date(x$date) %in% dates_FF,])

If the format of the dates is completely consistent and you only want a couple of dates you could also just use a character vector and not convert the dates.

dates_FF <- c("2017/08/05","2017/08/06")
Result_FF <- lapply(Flowtest, function(x) x[x$date %in% dates_FF,])

If you want to work with - and keep - your nested list, you could use Map to subset all vectors of each list element:

dates_FF <- as.Date(c("2017/08/05","2017/08/06")) # fixed
Result_FF <- lapply(Flowtest1, function(x) Map(`[`, x, list(x$date1 %in% dates_FF)))

You can then get your desired Result list by using bind_rows from dplyr :

lapply(Result_FF, bind_rows)

As mentioned in another post, you do not seem to need to use nested lists here ( Flowtest1 ), but could use Flowtest directly, which would be easiest.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM