I am working with several temperature datasets and trying to pull out when the temperature meets or exceeds a threshold value. Ideally, I want to know how many times (count) that value is met/ exceeded each year for ~100 yrs of data AND when (what date) that value is first exceeded and last exceeded in each year.
Data is in a table (.csv file brought into R) with columns YR, MO, DA, TMAX
For the first part, I have tried using subset to pull out all the times the temperature exceeds a value but then I still have to add up each year (time consuming) subset(data, TMAX > 20.86)
I've figured out how to use count, but that gives me all the occurrences in the dataset count(data, vars = "TMAX")
And I have played around with summarise but gotten no where. Any help would be appreciate- especially for the second part of my question- finding the first and last occurrence each year.
Here is sample data. This is SeatlleTMAX (rather than data) as it is the TMAX values for Seattle. YR MO DA TMAX
1909 9 1 28.9
1909 9 2 30.0
1909 9 3 28.3
1909 9 4 33.9
1909 9 5 31.7
1909 9 6 28.3
1909 9 7 26.7
1909 9 8 23.3
1909 9 9 22.2
1909 9 10 17.8
1909 9 11 14.4
1909 9 12 25.6
1909 9 13 23.9
1909 9 14 25.0
1909 9 15 29.4
1909 9 16 28.3
1909 9 17 14.4
1909 9 18 21.7
1909 9 19 14.4
1909 9 20 13.3
1909 9 21 15.6
1909 9 22 20.6
1909 9 23 23.3
1909 9 24 20.0
1909 9 25 21.1
1909 9 26 22.2
1909 9 27 25.6
1909 9 28 22.2
1909 9 29 15.0
1909 9 30 12.2
Adapting my comment into an answer, taking into account the presented data and OP's comments. Note, code is not checked as dput
of data was not obtained.
library("dplyr")
data_summarised <-
data %>%
mutate(date = as.Date(paste(YR, MO, DA, sep = "-"))) %>% # concatenate YR MO DA into an ISO date, convert column into date type
filter(TMAX > 20.86) %>%
group_by(YR) %>%
summarise(number_of_days = n(), # count number of rows in each group
first_date = min(date),
last_date = max(date))
install.packages("dplyr")
library(dplyr)
data %>%
group_by(YR) %>%
summarize(n_break_threshold=sum(TMAX > 20.86))
This assumes your data is in a data.frame
called data
. What this code says effectively is "Take data
, set it up so that dplyr
operations happen on groups of the data.frame
composed of the unique values in the variable YR
and then run a summarize operation (ie one that returns an atomic vector) that counts the number of times the relation TMAX > 20.86
is TRUE
."
You will probably notice this is very similar to SQL
if you have used that before.
library(plyr)
Data example
The example takes a period of two years and selects for temperature randomly values from 0 to 22.
dat<-seq(as.Date("2013/1/1"), as.Date("2014/12/31"), "days")
DA<-as.numeric(format(dat, "%d"))
MO<-as.numeric(format(dat, "%m"))
YR<-as.numeric(format(dat, "%Y"))
TMAX<-runif(length(dat), 0, 22)
df<-data.frame(dat, DA, MO, YR, TMAX)
Thres=20.86
Count for each month (irrespective of year)
ddply(df, .(MO), summarise, count = sum(TMAX>Thres))
Count for each month in each year
ddply(df, .(YR, MO), summarise, count = sum(TMAX>Thres))
First day that temperature exceeds threshold for each year
temp<-ddply(df, .(YR, dat), summarise, count = sum(TMAX>Thres))
res<-subset(temp, count==1)
ddply(res, .(YR), summarise, min = min(dat))
Last day that temperature exceeds threshold for each year
ddply(res, .(YR), summarise, max = max(dat))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.