繁体   English   中英

在 R 中提取超过月平均值的小时值(及其索引)

[英]Extract hourly values ( and its indexes) that exceeds monthly averages in R

我有两个数据框。 df 和 df1。 两个数据文件都很大(涵盖从 01-01-200 到 31-10-2019 的数据)所以我只上传了一个小样本。

df 包含 3 个变量的每小时值及其对应的日期向量,如下所示:

           date           SH1          SH2         SH3        
     2000-01-01 00:00:00 1.941013e-01 1.506780e-01 0.124487891 
     2000-01-01 03:00:00 2.897915e-01 2.743722e-01 0.188432490 
     2000-01-01 06:00:00 3.139408e-01 2.250532e-01 0.001473900 
     2000-01-01 09:00:00 1.845777e-01 1.041934e-01 0.047391565 
     2000-01-01 12:00:00 1.022660e-01 6.179044e-02 0.008843402 
     

df <- structure(list(datex = structure(c(946681200, 946692000, 946702800, 
946713600, 946724400), class = c("POSIXct", "POSIXt")), SH1 = c(0.194101337780203, 
0.289791483274648, 0.313940773547535, 0.184577674010614, 0.102266008573448
), SH2 = c(0.150677966068861, 0.274372218123884, 0.225053245031368, 
0.104193416717294, 0.0617904375526934), SH3 = c(0.12448789070249, 
0.188432490298304, 0.00147390034529415, 0.0473915649486711, 0.00884340207176182
)), class = c("data.table", "data.frame"), row.names = c(NA, 
-5L))

而 df1 是相同数据的月平均值(每年)。 看起来像这样:

      date       SH1       SH2       SH3       
  2000-01-01 0.7733497 0.6237698 0.4182768 
  2000-02-01 0.7308772 0.5575175 0.3636893  
  2000-03-01 0.3278784 0.3040463 0.2233942  
  2000-04-01 0.4496596 0.3124064 0.1805953  
  2000-05-01 0.4500503 0.4032727 0.2562054  

df1 <- structure(list(datex = structure(c(10957, 10988, 11017, 11048, 
11078), class = "Date"), SH1 = c(0.773349659462019, 0.730877175434939, 
0.327878366545974, 0.44965959591958, 0.450050258753037), SH2 = c(0.623769804010216, 
0.557517466419755, 0.304046348866025, 0.312406405495768, 0.403272666559865
), SH3 = c(0.418276825782115, 0.36368930844493, 0.223394192812674, 
0.18059530865458, 0.256205390604878)), row.names = c(NA, -5L), class = c("data.table", 
"data.frame"))

我想提取 df 中超过其相应月平均值 (df1) 的值,并获取这些值的位置(索引)。 我怎样才能做到这一点? 我不是 R 专家,所以请耐心等待。

我认为必须根据两个数据集中的年份和月份进行比较,但我不知道如何去做。

我会以这种方式做一些事情:

df %>% 
 group_by(year = year(date), month = month(date)) %>% 
  mutate(
    monthly_average_SH1 = mean(SH1),
    monthly_average_SH2 = mean(SH2),
    monthly_average_SH3 = mean(SH3),
    flag_exceed_SH1 = ifelse(SH1 > monthly_average_SH1, TRUE, FALSE),
    flag_exceed_SH2 = ifelse(SH2 > monthly_average_SH2, TRUE, FALSE),
    flag_exceed_SH3 = ifelse(SH3 > monthly_average_SH3, TRUE, FALSE),
    flag_any_exceed = ifelse(flag_exceed_SH1 | flag_exceed_SH2 | flag_exceed_SH3, TRUE, FALSE)
  ) %>% 
  filter(flag_any_exceed)

这将为您提供一个包含所有超出该值的行的 df。 请注意, df1 不是必需的,因为您可以在同一个 df 上生成均值。

如果你想要索引:

df_2 <- df %>% 
  group_by(year = year(date), month = month(date)) %>% 
  mutate(
    monthly_average_SH1 = mean(SH1),
    monthly_average_SH2 = mean(SH2),
    monthly_average_SH3 = mean(SH3),
    flag_exceed_SH1 = ifelse(SH1 > monthly_average_SH1, TRUE, FALSE),
    flag_exceed_SH2 = ifelse(SH2 > monthly_average_SH2, TRUE, FALSE),
    flag_exceed_SH3 = ifelse(SH3 > monthly_average_SH3, TRUE, FALSE),
    flag_any_exceed = ifelse(flag_exceed_SH1 | flag_exceed_SH2 | flag_exceed_SH3, TRUE, FALSE)
  ) 
which(df2$flag_any_exceed)

希望这是有用的

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM