简体   繁体   English

需要计算每年达到(或超过)阈值的次数(使用R)

[英]Need to count the number of times a threshold value is met (or exceeded) per year (using R)

I am working with several temperature datasets and trying to pull out when the temperature meets or exceeds a threshold value. 我正在处理几个温度数据集,并试图在温度达到或超过阈值时拉出。 Ideally, I want to know how many times (count) that value is met/ exceeded each year for ~100 yrs of data AND when (what date) that value is first exceeded and last exceeded in each year. 理想情况下,我想知道每年约100次数据达到/超过该值的次数(计数)以及每年首次超过和最后超过该值的时间(何时)。

Data is in a table (.csv file brought into R) with columns YR, MO, DA, TMAX 数据位于表(带有R的.csv文件)中,列为YR,MO,DA,TMAX

For the first part, I have tried using subset to pull out all the times the temperature exceeds a value but then I still have to add up each year (time consuming) subset(data, TMAX > 20.86) 对于第一部分,我尝试使用子集来拉出温度超过一个值的所有时间,但是我仍然需要每年累计(耗时)子集(数据,TMAX> 20.86)

I've figured out how to use count, but that gives me all the occurrences in the dataset count(data, vars = "TMAX") 我已经弄清楚了如何使用count,但是这给了我数据集计数中出现的所有数据(data,vars =“TMAX”)

And I have played around with summarise but gotten no where. 我总结了总结,但没有在哪里。 Any help would be appreciate- especially for the second part of my question- finding the first and last occurrence each year. 任何帮助都会受到赞赏 - 特别是对于我的问题的第二部分 - 每年发现第一次和最后一次。

Here is sample data. 这是样本数据。 This is SeatlleTMAX (rather than data) as it is the TMAX values for Seattle. 这是SeatlleTMAX(而不是数据),因为它是西雅图的TMAX值。 YR MO DA TMAX YR MO DA TMAX
1909 9 1 28.9 1909 9 1 28.9
1909 9 2 30.0 1909 9 2 30.0
1909 9 3 28.3 1909 9 3 28.3
1909 9 4 33.9 1909 9 4 33.9
1909 9 5 31.7 1909 9 5 31.7
1909 9 6 28.3 1909 9 6 28.3
1909 9 7 26.7 1909 9 7 26.7
1909 9 8 23.3 1909 9 8 23.3
1909 9 9 22.2 1909 9 9 22.2
1909 9 10 17.8 1909 9 10 17.8
1909 9 11 14.4 1909 9 11 14.4
1909 9 12 25.6 1909 9 12 25.6
1909 9 13 23.9 1909 9 13 23.9
1909 9 14 25.0 1909 9 14 25.0
1909 9 15 29.4 1909 9 15 29.4
1909 9 16 28.3 1909 9 16 28.3
1909 9 17 14.4 1909 9 17 14.4
1909 9 18 21.7 1909 9 18 21.7
1909 9 19 14.4 1909 9 19 14.4
1909 9 20 13.3 1909 9 20 13.3
1909 9 21 15.6 1909 9 21 15.6
1909 9 22 20.6 1909 9 22 20.6
1909 9 23 23.3 1909 9 23 23.3
1909 9 24 20.0 1909 9 24 20.0
1909 9 25 21.1 1909 9 25 21.1
1909 9 26 22.2 1909 9 26 22.2
1909 9 27 25.6 1909 9 27 25.6
1909 9 28 22.2 1909 9 28 22.2
1909 9 29 15.0 1909 9 29 15.0
1909 9 30 12.2 1909 9 30 12.2

Adapting my comment into an answer, taking into account the presented data and OP's comments. 考虑到所呈现的数据和OP的评论,使我的评论适应答案。 Note, code is not checked as dput of data was not obtained. 请注意,代码不检查作为dput数据没有获得。

library("dplyr")

data_summarised <-
    data %>% 
    mutate(date = as.Date(paste(YR, MO, DA, sep = "-"))) %>% # concatenate YR MO DA into an ISO date, convert column into date type 
    filter(TMAX > 20.86) %>%
    group_by(YR) %>%
    summarise(number_of_days = n(), # count number of rows in each group
              first_date = min(date),
              last_date = max(date))
install.packages("dplyr")
library(dplyr)    
data %>%
  group_by(YR) %>%
  summarize(n_break_threshold=sum(TMAX > 20.86))

This assumes your data is in a data.frame called data . 这假设您的数据位于称为datadata.frame What this code says effectively is "Take data , set it up so that dplyr operations happen on groups of the data.frame composed of the unique values in the variable YR and then run a summarize operation (ie one that returns an atomic vector) that counts the number of times the relation TMAX > 20.86 is TRUE ." 这段代码有效地说明了“获取data ,设置它以便dplyr操作发生在由变量YR中的唯一值组成的data.frame组中,然后运行汇总操作(即返回原子矢量的操作)计算关系TMAX > 20.86TRUE 。“

You will probably notice this is very similar to SQL if you have used that before. 您可能会注意到,如果您之前使用过,那么这与SQL非常相似。

library(plyr) 库(plyr)

Data example 数据示例

The example takes a period of two years and selects for temperature randomly values from 0 to 22. 该示例需要两年的时间,并从0到22选择温度随机值。

dat<-seq(as.Date("2013/1/1"), as.Date("2014/12/31"), "days")
DA<-as.numeric(format(dat, "%d"))
MO<-as.numeric(format(dat, "%m"))
YR<-as.numeric(format(dat, "%Y"))
TMAX<-runif(length(dat), 0, 22)

df<-data.frame(dat, DA, MO, YR, TMAX)

Thres=20.86

Count for each month (irrespective of year) 每个月的数量(不论年份)

ddply(df, .(MO), summarise, count = sum(TMAX>Thres)) 

Count for each month in each year 计算每年的每个月

ddply(df, .(YR, MO), summarise, count = sum(TMAX>Thres)) 

First day that temperature exceeds threshold for each year 每年温度超过阈值的第一天

temp<-ddply(df, .(YR, dat), summarise, count = sum(TMAX>Thres)) 
res<-subset(temp, count==1)
ddply(res, .(YR), summarise, min = min(dat))

Last day that temperature exceeds threshold for each year 最后一天,温度超过每年的阈值

ddply(res, .(YR), summarise, max = max(dat))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM