简体   繁体   English

在 R 中的每日时间序列数据中逐年连续存在

[英]Presence in consecutive day by year in daily time-series data in R

I have daily time-series data for 60 years about the presence and absence of rainfall for 400 stations.我有 60 年来关于 400 个站点有无降雨的每日时间序列数据。 The data is in the following format where, in the second column, 1 indicates presence and 0 indicate absence:数据采用以下格式,其中,在第二列中,1 表示存在,0 表示不存在:

Date         Rainfall
---------------------
1981-01-01   0
1981-01-02   0
1981-01-03   0
1981-01-04   1
1981-01-05   0
1981-01-06   1
1981-01-07   1
1981-01-08   1
1981-01-09   0
1981-01-10   0
1981-01-11   1
1981-01-12   1
1981-01-13   1
1981-01-14   1
1981-01-15   1
1981-01-16   0
..........   .

Now I have to calculate the number of consecutive wet days for each year when at least 3 consecutive days received rainfall and the longest consecutive days of rainfall in a year.现在我必须计算每年至少连续 3 天降雨的连续潮湿天数和一年中最长的连续降雨天数。 If 3 or more than 3 consecutive days (any number) received rainfall I will consider it as a single event.如果连续 3 天或超过 3 天(任意数量)下雨,我会将其视为单一事件。

My output will be like this我的输出将是这样的

Year      No of consecutive wet-days   longest consecutive wet-days
1981      2                            5
.
.

How can we do this in R?我们如何在 R 中做到这一点? If I can solve for a station, I can iterate for all stations in R.如果我可以解决一个站,我可以迭代 R 中的所有站。

Thank you in advance for your help :)预先感谢您的帮助 :)

Another possible solution (I thank @DarrenTsai for his comments, which have improved this solution):另一个可能的解决方案(我感谢@DarrenTsai 的评论,他改进了这个解决方案):

library(tidyverse)
library(lubridate)

df %>% 
  group_by(Year = year(ymd(Date))) %>%
  mutate(x = list(rle(Rainfall))) %>% 
  summarise(ncons = sum(x[[1]]$lengths >= 3 & x[[1]]$values == 1),
            longest = ifelse(sum(x[[1]]$values == 1) == 0, 0, 
                max(x[[1]]$lengths[x[[1]]$values == 1])))

#> # A tibble: 2 × 3
#>    Year ncons longest
#>   <dbl> <int>   <int>
#> 1  1981     2       5
#> 2  1982     2       4

You can create event with rle .您可以使用rle创建事件。

library(dplyr)
df <- df %>%
  mutate(event = with(rle(Rainfall), rep(seq_along(lengths), lengths))) 

         Date Rainfall event
1  1981-01-01        0     1
2  1981-01-02        0     1
3  1981-01-03        0     1
4  1981-01-04        1     2
5  1981-01-05        0     3
6  1981-01-06        1     4
7  1981-01-07        1     4
8  1981-01-08        1     4
9  1981-01-09        0     5
10 1981-01-10        0     5
11 1981-01-11        1     6
12 1981-01-12        1     6
13 1981-01-13        1     6
14 1981-01-14        1     6
15 1981-01-15        1     6
16 1981-01-16        0     7

With that you can tally up the number of consecutive days for each rainfall event.这样,您就可以计算每次降雨事件的连续天数。

df %>% filter(Rainfall == 1) %>% group_by(event) %>% tally()
# A tibble: 3 x 2
  event     n
  <int> <int>
1     2     1
2     4     3
3     6     5

Further wrangling with Year and counting the tallied up rain event will give you your expected summary.与 Year 进一步争论并计算统计的降雨事件将为您提供预期的总结。

Another possible solution, Though I think already better solutions have been suggested另一种可能的解决方案,虽然我认为已经提出了更好的解决方案

library(lubridate)
library(dplyr)
library(tibble)


rainfall_data <- tibble::tribble(
      ~ date, ~rainfall,
       "1981-01-01",   0,
       "1981-01-02",   0,
       "1981-01-03",   0,
       "1981-01-04",   1,
       "1981-01-05",   0,
       "1981-01-06",   1,
       "1981-01-07",   1,
       "1981-01-08",   1,
       "1981-01-09",   0,
       "1981-01-10",   0,
       "1981-01-11",   1,
       "1981-01-12",   1,
       "1981-01-13",   1,
       "1981-01-14",   1,
       "1981-01-15",   1,
       "1981-01-16",   0
)


rainfall_data %>%
    mutate(
        csum = ave(rainfall, cumsum(rainfall == 0), FUN = cumsum),
        event = ave(csum, cumsum(rainfall == 0), FUN = max)
    ) %>%
    filter(event >= 3) %>%
    distinct(event, .keep_all = TRUE) %>%
    group_by(year = year(ymd(date))) %>%
    summarise(
        No_of_consecutive_wet_days = n(),
        longest_consecutive_wet_days = max(event)
    )
#> # A tibble: 1 × 3
#>    year No_of_consecutive_wet_days longest_consecutive_wet_days
#>   <dbl>                      <int>                        <dbl>
#> 1  1981                          2                            5

Created on 2022-07-06 by the reprex package (v2.0.1)由 reprex 包 (v2.0.1) 创建于 2022-07-06

A simple solution using only {base} R functions, particularly diff and tapply .仅使用 {base} R 函数的简单解决方案,尤其是difftapply The summary statistics pertain to events with a start date in that year.汇总统计信息与开始日期在该年的事件有关。

date <- seq(as.Date("2000/1/1"), as.Date("2010/1/1"), "days")
rainfall <- sample(c(0,1),length(date), replace = T)
df <- data.frame(date,rainfall)

events <- df$date[which(c(1,diff(df$rainfall)) > 0)]
event.lengths <- tapply(df$rainfall, cumsum(c(1, diff(df$rainfall) > 0 )), sum)
df.events <- data.frame(events,event.lengths)

Total_rain_days <- tapply(df.events$event.lengths, format(df.events$events, format = "%Y"),sum)
Max_consecutive_rain_days <- tapply(df.events$event.lengths, format(df.events$events, format = "%Y"),max)
Year <- names(Total_rain_days)

output <- data.frame(Year, Total_rain_days, Max_consecutive_rain_days)
output

> output
     Year Total_rain_days Max_consecutive_rain_days
2000 2000             192                         8
2001 2001             175                         9
2002 2002             196                        13
2003 2003             193                         9
2004 2004             164                        11
2005 2005             183                         7
2006 2006             196                        10
2007 2007             176                         7
2008 2008             179                         7
2009 2009             178                         9
2010 2010               1                         1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM