简体   繁体   English


[英]Need to count the number of times a threshold value is met (or exceeded) per year (using R)

I am working with several temperature datasets and trying to pull out when the temperature meets or exceeds a threshold value. 我正在处理几个温度数据集,并试图在温度达到或超过阈值时拉出。 Ideally, I want to know how many times (count) that value is met/ exceeded each year for ~100 yrs of data AND when (what date) that value is first exceeded and last exceeded in each year. 理想情况下,我想知道每年约100次数据达到/超过该值的次数(计数)以及每年首次超过和最后超过该值的时间(何时)。

Data is in a table (.csv file brought into R) with columns YR, MO, DA, TMAX 数据位于表(带有R的.csv文件)中,列为YR,MO,DA,TMAX

For the first part, I have tried using subset to pull out all the times the temperature exceeds a value but then I still have to add up each year (time consuming) subset(data, TMAX > 20.86) 对于第一部分,我尝试使用子集来拉出温度超过一个值的所有时间,但是我仍然需要每年累计(耗时)子集(数据,TMAX> 20.86)

I've figured out how to use count, but that gives me all the occurrences in the dataset count(data, vars = "TMAX") 我已经弄清楚了如何使用count,但是这给了我数据集计数中出现的所有数据(data,vars =“TMAX”)

And I have played around with summarise but gotten no where. 我总结了总结,但没有在哪里。 Any help would be appreciate- especially for the second part of my question- finding the first and last occurrence each year. 任何帮助都会受到赞赏 - 特别是对于我的问题的第二部分 - 每年发现第一次和最后一次。

Here is sample data. 这是样本数据。 This is SeatlleTMAX (rather than data) as it is the TMAX values for Seattle. 这是SeatlleTMAX(而不是数据),因为它是西雅图的TMAX值。 YR MO DA TMAX YR MO DA TMAX
1909 9 1 28.9 1909 9 1 28.9
1909 9 2 30.0 1909 9 2 30.0
1909 9 3 28.3 1909 9 3 28.3
1909 9 4 33.9 1909 9 4 33.9
1909 9 5 31.7 1909 9 5 31.7
1909 9 6 28.3 1909 9 6 28.3
1909 9 7 26.7 1909 9 7 26.7
1909 9 8 23.3 1909 9 8 23.3
1909 9 9 22.2 1909 9 9 22.2
1909 9 10 17.8 1909 9 10 17.8
1909 9 11 14.4 1909 9 11 14.4
1909 9 12 25.6 1909 9 12 25.6
1909 9 13 23.9 1909 9 13 23.9
1909 9 14 25.0 1909 9 14 25.0
1909 9 15 29.4 1909 9 15 29.4
1909 9 16 28.3 1909 9 16 28.3
1909 9 17 14.4 1909 9 17 14.4
1909 9 18 21.7 1909 9 18 21.7
1909 9 19 14.4 1909 9 19 14.4
1909 9 20 13.3 1909 9 20 13.3
1909 9 21 15.6 1909 9 21 15.6
1909 9 22 20.6 1909 9 22 20.6
1909 9 23 23.3 1909 9 23 23.3
1909 9 24 20.0 1909 9 24 20.0
1909 9 25 21.1 1909 9 25 21.1
1909 9 26 22.2 1909 9 26 22.2
1909 9 27 25.6 1909 9 27 25.6
1909 9 28 22.2 1909 9 28 22.2
1909 9 29 15.0 1909 9 29 15.0
1909 9 30 12.2 1909 9 30 12.2

Adapting my comment into an answer, taking into account the presented data and OP's comments. 考虑到所呈现的数据和OP的评论,使我的评论适应答案。 Note, code is not checked as dput of data was not obtained. 请注意,代码不检查作为dput数据没有获得。


data_summarised <-
    data %>% 
    mutate(date = as.Date(paste(YR, MO, DA, sep = "-"))) %>% # concatenate YR MO DA into an ISO date, convert column into date type 
    filter(TMAX > 20.86) %>%
    group_by(YR) %>%
    summarise(number_of_days = n(), # count number of rows in each group
              first_date = min(date),
              last_date = max(date))
data %>%
  group_by(YR) %>%
  summarize(n_break_threshold=sum(TMAX > 20.86))

This assumes your data is in a data.frame called data . 这假设您的数据位于称为datadata.frame What this code says effectively is "Take data , set it up so that dplyr operations happen on groups of the data.frame composed of the unique values in the variable YR and then run a summarize operation (ie one that returns an atomic vector) that counts the number of times the relation TMAX > 20.86 is TRUE ." 这段代码有效地说明了“获取data ,设置它以便dplyr操作发生在由变量YR中的唯一值组成的data.frame组中,然后运行汇总操作(即返回原子矢量的操作)计算关系TMAX > 20.86TRUE 。“

You will probably notice this is very similar to SQL if you have used that before. 您可能会注意到,如果您之前使用过,那么这与SQL非常相似。

library(plyr) 库(plyr)

Data example 数据示例

The example takes a period of two years and selects for temperature randomly values from 0 to 22. 该示例需要两年的时间,并从0到22选择温度随机值。

dat<-seq(as.Date("2013/1/1"), as.Date("2014/12/31"), "days")
DA<-as.numeric(format(dat, "%d"))
MO<-as.numeric(format(dat, "%m"))
YR<-as.numeric(format(dat, "%Y"))
TMAX<-runif(length(dat), 0, 22)

df<-data.frame(dat, DA, MO, YR, TMAX)


Count for each month (irrespective of year) 每个月的数量(不论年份)

ddply(df, .(MO), summarise, count = sum(TMAX>Thres)) 

Count for each month in each year 计算每年的每个月

ddply(df, .(YR, MO), summarise, count = sum(TMAX>Thres)) 

First day that temperature exceeds threshold for each year 每年温度超过阈值的第一天

temp<-ddply(df, .(YR, dat), summarise, count = sum(TMAX>Thres)) 
res<-subset(temp, count==1)
ddply(res, .(YR), summarise, min = min(dat))

Last day that temperature exceeds threshold for each year 最后一天,温度超过每年的阈值

ddply(res, .(YR), summarise, max = max(dat))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM