简体   繁体   中英

Conditional subsetting between dates in R

I am matching precipitation isotope values to the date of precipitation events. Sample collection occurred on a 7-10 day basis, and I want to find how many samples capture a single day of precipitation. I aim to create a new data frame with a date, precipitation amount, and isotope value.
Here are some example data. The data frame demonstrates the structure of what I have scraped together from several repositories.

# example dates over two week period
start <- as.Date('2017/01/01')
len <- 14
dates <- seq(start, by = "day", length.out = len)

# example precip events in total mm accumulation 
prcp <- c(0, 1.0, 2.0, 0, 0, 0, 0,  # week 1
          0, 1.75, 0, 0, 0, 0, 0)   # week 2

# sample measurements (numeric)
samp <- c(NA, NA, NA, NA, -15.0, NA, NA,
          NA, NA, NA, NA, NA, NA, -12.0) 

# df showing dates, the recorded precip, and the sample measurements
# notice that sample values are associated with collection date
raw <- data.frame(dates, prcp, samp)

In this example, there are two sample measurements. The first one (-15) corresponds with two days of precipitation during the first week. The second sample value (-12) corresponds to a single recorded day of precipitation.
I am interested in grabbing the second sample value and matching it to the date and amount recorded on 2017-01-09. Although the sample was collected on 2017-01-14, it reflects the rainfall event and conditions that existed on 2017-01-09.

So far, I have made lists of the indices where there are precipitation and sample measurements. I am attempting to look at rows between consecutive sample dates. If there is only one precipitation value, I want to match its date with the later sample date. If a single precipitation date matches the sample date, then I want to match them. I would discard all other conditions with more than one or zero precipitation days.

Thank you for any help on my approach or alternative methods and suggestions!

Provided I understood you correctly (see my comment), here is an option:

library(dplyr)
library(lubridate)
raw %>%
    mutate(week = week(dates)) %>%
    group_by(week) %>%
    filter(sum(prcp > 0) == 1) %>%
    fill(samp, .direction = "downup") %>%
    slice_max(prcp) %>%
    ungroup()
## A tibble: 1 x 4
#  dates       prcp  samp  week
#  <date>     <dbl> <dbl> <dbl>
#1 2017-01-09  1.75   -12     2

Explanation: Determine the week for each dates ; group by week and keep only those weeks where there is exactly one day of precipitation. Replace all NA s in samp with the entry when precipitation was collected. Keep the (single) row per week that has a non-zero precipitation; then ungroup.

If you don't need the sample ID you can skip the fill step. If you don't want to week column, remove with select(-week) at the end.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM