按組有條件地從時間序列中過濾觀察結果

Question

我有一個包含多個時間序列（值〜時間）的df（“ df”），其觀察結果按3個因素分組：溫度，重復和物種。 這些數據需要在時間序列的下端和上端進行修整，但是這些閾值是組條件的（例如，刪除2以下和10以上的觀測值，其中temp = 10，rep = 2，並且種類=“ A”）。 我有一個隨附的df（df_thresholds），其中包含分組值以及每個組要使用的最小值和最大值。 並非所有組都需要修剪（我想定期更新此文件，以指導在哪里修剪df）。 有人可以幫我有條件地按組過濾掉這些值嗎？ 我有以下內容，雖然很接近但還不足夠。 當我反轉最大和最小布爾測試時，我得到零觀測值。

df <- data.frame(species = c(rep("A", 16), rep("B", 16)),
                 temp=as.factor(c(rep(10,4),rep(20,4),rep(10,4),rep(20,4))),
                 rep=as.factor(c(rep(1,8),rep(2,8),rep(1,8),rep(2,8))),
                 time=rep(seq(1:4),4),
                 value=c(1,4,8,16,2,4,9,16,2,4,10,16,2,4,15,16,2,4,6,16,1,4,8,16,1,2,8,16,2,3,4,16))

df_thresholds <- data.frame(species=c("A", "A", "B"), 
                            temp=as.factor(c(10,20,10)),
                            rep=as.factor(c(1,1,2)), 
                            min_value=c(2,4,2),
                            max_value=c(10,10,9))

#desired outcome
df_desired <- df[c(2:3,6:7,9:24,26:27,29:nrow(df)),]


#attempt
df2 <- df

for (i in 1:nrow(df_thresholds)) {  
  df2 <- df2 %>%
    filter(!(species==df_thresholds$species[i] & temp==df_thresholds$temp[i] & rep==df_thresholds$rep[i] & value>df_thresholds$min_value[i] & value<df_thresholds$max_value[i]))
}

編輯：這是我根據以下建議實施的解決方案。

df_test <- left_join(df, df_thresholds, by=c('species','temp','rep'))
df_test$min_value[is.na(df_test$min_value)] <- 0
df_test$max_value[is.na(df_test$max_value)] <- 999

df_test2 <- df_test %>%
  filter(value >= min_value & value <= max_value)

Answer 1

我們可以使用mapply找出要排除的mapply

df[-c(with(df_thresholds, 
      mapply(function(x, y, z, min_x, max_x) 
           which(df$species == x & df$temp == y & df$rep == z & 
              (df$value < min_x | df$value > max_x)),
                 species, temp, rep, min_value, max_value))), ]


#   species temp rep time value
#2        A   10   1    2     4
#3        A   10   1    3     8
#6        A   20   1    2     4
#7        A   20   1    3     9
#9        A   10   2    1     2
#10       A   10   2    2     4
#11       A   10   2    3    10
#12       A   10   2    4    16
#......

在mapply我們相應地傳遞df_thresholds過濾器df所有列，並找出每一行的最小值和最大值之外的索引，並將其從原始數據幀中排除。

mapply調用的結果是

#[1]  1  4  5  8 25 28

我們要從df排除的行，因為它們超出了范圍。

按組有條件地從時間序列中過濾觀察結果

問題描述

1 個解決方案

解決方案1
0 2018-12-11 02:50:12

按組有條件地從時間序列中過濾觀察結果

問題描述

1 個解決方案

解決方案1 0 2018-12-11 02:50:12

解決方案1
0 2018-12-11 02:50:12