在 R 中提取超过月平均值的小时值（及其索引）

Question

我有两个数据框。 df 和 df1。 两个数据文件都很大（涵盖从 01-01-200 到 31-10-2019 的数据）所以我只上传了一个小样本。

df 包含 3 个变量的每小时值及其对应的日期向量，如下所示：

           date           SH1          SH2         SH3        
     2000-01-01 00:00:00 1.941013e-01 1.506780e-01 0.124487891 
     2000-01-01 03:00:00 2.897915e-01 2.743722e-01 0.188432490 
     2000-01-01 06:00:00 3.139408e-01 2.250532e-01 0.001473900 
     2000-01-01 09:00:00 1.845777e-01 1.041934e-01 0.047391565 
     2000-01-01 12:00:00 1.022660e-01 6.179044e-02 0.008843402 
     

df <- structure(list(datex = structure(c(946681200, 946692000, 946702800, 
946713600, 946724400), class = c("POSIXct", "POSIXt")), SH1 = c(0.194101337780203, 
0.289791483274648, 0.313940773547535, 0.184577674010614, 0.102266008573448
), SH2 = c(0.150677966068861, 0.274372218123884, 0.225053245031368, 
0.104193416717294, 0.0617904375526934), SH3 = c(0.12448789070249, 
0.188432490298304, 0.00147390034529415, 0.0473915649486711, 0.00884340207176182
)), class = c("data.table", "data.frame"), row.names = c(NA, 
-5L))

而 df1 是相同数据的月平均值（每年）。 看起来像这样：

      date       SH1       SH2       SH3       
  2000-01-01 0.7733497 0.6237698 0.4182768 
  2000-02-01 0.7308772 0.5575175 0.3636893  
  2000-03-01 0.3278784 0.3040463 0.2233942  
  2000-04-01 0.4496596 0.3124064 0.1805953  
  2000-05-01 0.4500503 0.4032727 0.2562054  

df1 <- structure(list(datex = structure(c(10957, 10988, 11017, 11048, 
11078), class = "Date"), SH1 = c(0.773349659462019, 0.730877175434939, 
0.327878366545974, 0.44965959591958, 0.450050258753037), SH2 = c(0.623769804010216, 
0.557517466419755, 0.304046348866025, 0.312406405495768, 0.403272666559865
), SH3 = c(0.418276825782115, 0.36368930844493, 0.223394192812674, 
0.18059530865458, 0.256205390604878)), row.names = c(NA, -5L), class = c("data.table", 
"data.frame"))

我想提取 df 中超过其相应月平均值 (df1) 的值，并获取这些值的位置（索引）。 我怎样才能做到这一点？ 我不是 R 专家，所以请耐心等待。

我认为必须根据两个数据集中的年份和月份进行比较，但我不知道如何去做。

Answer 1

我会以这种方式做一些事情：

df %>% 
 group_by(year = year(date), month = month(date)) %>% 
  mutate(
    monthly_average_SH1 = mean(SH1),
    monthly_average_SH2 = mean(SH2),
    monthly_average_SH3 = mean(SH3),
    flag_exceed_SH1 = ifelse(SH1 > monthly_average_SH1, TRUE, FALSE),
    flag_exceed_SH2 = ifelse(SH2 > monthly_average_SH2, TRUE, FALSE),
    flag_exceed_SH3 = ifelse(SH3 > monthly_average_SH3, TRUE, FALSE),
    flag_any_exceed = ifelse(flag_exceed_SH1 | flag_exceed_SH2 | flag_exceed_SH3, TRUE, FALSE)
  ) %>% 
  filter(flag_any_exceed)

这将为您提供一个包含所有超出该值的行的 df。 请注意， df1 不是必需的，因为您可以在同一个 df 上生成均值。

如果你想要索引：

df_2 <- df %>% 
  group_by(year = year(date), month = month(date)) %>% 
  mutate(
    monthly_average_SH1 = mean(SH1),
    monthly_average_SH2 = mean(SH2),
    monthly_average_SH3 = mean(SH3),
    flag_exceed_SH1 = ifelse(SH1 > monthly_average_SH1, TRUE, FALSE),
    flag_exceed_SH2 = ifelse(SH2 > monthly_average_SH2, TRUE, FALSE),
    flag_exceed_SH3 = ifelse(SH3 > monthly_average_SH3, TRUE, FALSE),
    flag_any_exceed = ifelse(flag_exceed_SH1 | flag_exceed_SH2 | flag_exceed_SH3, TRUE, FALSE)
  ) 
which(df2$flag_any_exceed)

希望这是有用的

在 R 中提取超过月平均值的小时值（及其索引）

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-09-03 09:48:42

在 R 中提取超过月平均值的小时值（及其索引）

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-09-03 09:48:42

解决方案1
0 已采纳 2020-09-03 09:48:42