简体   繁体   中英

Need to count the number of times a threshold value is met (or exceeded) per year (using R)

I am working with several temperature datasets and trying to pull out when the temperature meets or exceeds a threshold value. Ideally, I want to know how many times (count) that value is met/ exceeded each year for ~100 yrs of data AND when (what date) that value is first exceeded and last exceeded in each year.

Data is in a table (.csv file brought into R) with columns YR, MO, DA, TMAX

For the first part, I have tried using subset to pull out all the times the temperature exceeds a value but then I still have to add up each year (time consuming) subset(data, TMAX > 20.86)

I've figured out how to use count, but that gives me all the occurrences in the dataset count(data, vars = "TMAX")

And I have played around with summarise but gotten no where. Any help would be appreciate- especially for the second part of my question- finding the first and last occurrence each year.

Here is sample data. This is SeatlleTMAX (rather than data) as it is the TMAX values for Seattle. YR MO DA TMAX
1909 9 1 28.9
1909 9 2 30.0
1909 9 3 28.3
1909 9 4 33.9
1909 9 5 31.7
1909 9 6 28.3
1909 9 7 26.7
1909 9 8 23.3
1909 9 9 22.2
1909 9 10 17.8
1909 9 11 14.4
1909 9 12 25.6
1909 9 13 23.9
1909 9 14 25.0
1909 9 15 29.4
1909 9 16 28.3
1909 9 17 14.4
1909 9 18 21.7
1909 9 19 14.4
1909 9 20 13.3
1909 9 21 15.6
1909 9 22 20.6
1909 9 23 23.3
1909 9 24 20.0
1909 9 25 21.1
1909 9 26 22.2
1909 9 27 25.6
1909 9 28 22.2
1909 9 29 15.0
1909 9 30 12.2

Adapting my comment into an answer, taking into account the presented data and OP's comments. Note, code is not checked as dput of data was not obtained.

library("dplyr")

data_summarised <-
    data %>% 
    mutate(date = as.Date(paste(YR, MO, DA, sep = "-"))) %>% # concatenate YR MO DA into an ISO date, convert column into date type 
    filter(TMAX > 20.86) %>%
    group_by(YR) %>%
    summarise(number_of_days = n(), # count number of rows in each group
              first_date = min(date),
              last_date = max(date))
install.packages("dplyr")
library(dplyr)    
data %>%
  group_by(YR) %>%
  summarize(n_break_threshold=sum(TMAX > 20.86))

This assumes your data is in a data.frame called data . What this code says effectively is "Take data , set it up so that dplyr operations happen on groups of the data.frame composed of the unique values in the variable YR and then run a summarize operation (ie one that returns an atomic vector) that counts the number of times the relation TMAX > 20.86 is TRUE ."

You will probably notice this is very similar to SQL if you have used that before.

library(plyr)

Data example

The example takes a period of two years and selects for temperature randomly values from 0 to 22.

dat<-seq(as.Date("2013/1/1"), as.Date("2014/12/31"), "days")
DA<-as.numeric(format(dat, "%d"))
MO<-as.numeric(format(dat, "%m"))
YR<-as.numeric(format(dat, "%Y"))
TMAX<-runif(length(dat), 0, 22)

df<-data.frame(dat, DA, MO, YR, TMAX)

Thres=20.86

Count for each month (irrespective of year)

ddply(df, .(MO), summarise, count = sum(TMAX>Thres)) 

Count for each month in each year

ddply(df, .(YR, MO), summarise, count = sum(TMAX>Thres)) 

First day that temperature exceeds threshold for each year

temp<-ddply(df, .(YR, dat), summarise, count = sum(TMAX>Thres)) 
res<-subset(temp, count==1)
ddply(res, .(YR), summarise, min = min(dat))

Last day that temperature exceeds threshold for each year

ddply(res, .(YR), summarise, max = max(dat))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM