R-按2個因子變量分組以計算四分位數

Question

我有一個數據集，該數據集具有按[小時]和[Feedcodes]收集的[記錄數]的記錄數。 我想做的是創建一個列，該列告訴我每個記錄屬於哪個四分位數（probs = 0：4/4），以便在任何東西低於第一或第二四分位數時設置警報。我可以調查提要，看看是否有異常。

我首先嘗試了此方法，但意識到它不是按小時和提要代碼進行分組

 df<-within(ds, quartile<-as.integer(cut(ds$reccount,quantile(ds$reccount,probs=0:4/4),inlcude.lowest=TRUE)))

嘗試了這個，但仍然沒有返回我期望的結果

    as<-ddply(ds,.(as.factor(ds$hourtime),ds$FeedCode) , function(df)quantile(ds$reccount,probs=0:4/4))

我只需要添加一列將其分類為哪個四分位數。 數據如下：

    dput(head(dss,30))
structure(list(rownames = c(2371L, 2428L, 2459L, 2493L, 2573L, 
2581L, 2606L, 2633L, 2668L, 2683L, 2693L, 2748L, 2756L, 2819L, 
2865L, 2889L, 2896L, 2970L, 2988L, 3005L, 3047L, 3067L, 3111L, 
3132L, 3154L, 3177L, 3209L, 3241L, 3272L), hourtime = c(3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), reccount = c(2864L, 
3492L, 968L, 3271L, 6078L, 767L, 1365L, 6222L, 2515L, 3986L, 
4327L, 5764L, 3676L, 5338L, 6407L, 1217L, 3058L, 5673L, 3569L, 
3391L, 3169L, 6446L, 4201L, 884L, 3529L, 6461L, 3414L, 3246L, 
5486L), FeedCode = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = "MDSWJD", class = "factor"), quartile = c(4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L)), .Names = c("rownames", 
"hourtime", "reccount", "FeedCode", "quartile"), row.names = c(NA, 
29L), class = "data.frame")

Answer 1

您可以使用ave()通過對變量進行分組來運行cut / quantile：

dss$quartile <- with(dss, 
            ave(reccount, hourtime, FeedCode, 
                FUN=function(x).bincode(x, quantile(x), T,T)
                )
            )

Answer 2

您讓我對四分位數感到困惑，但是有5個組{0,1,2,3,4}。 我不知道我是否缺少任何東西，但這是一種dplyr方法。

第一個按{hourtime，FeedCode}組計算Q25％，並標記低於25％的所有內容。 第二個小組在每個小組的4個小組（四分位數）處分割記數，並分配小組編號{1到4}。

逐步運行代碼，如果發現錯誤，請告訴我。

library(dplyr)

# example dataset
dt = data.frame(hourtime = c(1,1,1,1,1,2,2,2,2,2),
                FeedCode = c("A","B","A","B","A","B","A","B","A","B"),
                reccount = c(946,184,1404,937,137,1199,698,1311,1302,560))   


dt %>% 
  group_by(hourtime, FeedCode) %>%
  mutate(Q25 = quantile(reccount,0.25),
         FlagBelowQ25 = ifelse(reccount < Q25, 1, 0)) %>%
  ungroup

#    hourtime FeedCode reccount    Q25 FlagBelowQ25
# 1         1        A      946 541.50            0
# 2         1        B      184 372.25            1
# 3         1        A     1404 541.50            0
# 4         1        B      937 372.25            0
# 5         1        A      137 541.50            1
# 6         2        B     1199 879.50            0
# 7         2        A      698 849.00            1
# 8         2        B     1311 879.50            0
# 9         2        A     1302 849.00            0
# 10        2        B      560 879.50            1


dt %>% 
  group_by(hourtime, FeedCode) %>%
  mutate(Quartile = ntile(reccount,4)) %>%
  ungroup

#    hourtime FeedCode reccount Quartile
# 1         1        A      946        2
# 2         1        B      184        1
# 3         1        A     1404        3
# 4         1        B      937        3
# 5         1        A      137        1
# 6         2        B     1199        2
# 7         2        A      698        1
# 8         2        B     1311        3
# 9         2        A     1302        3
# 10        2        B      560        1

R-按2個因子變量分組以計算四分位數

問題描述

2 個解決方案

解決方案1
0 2015-08-20 21:54:16

解決方案2
0 2015-08-20 22:00:49

R-按2個因子變量分組以計算四分位數

問題描述

2 個解決方案

解決方案1 0 2015-08-20 21:54:16

解決方案2 0 2015-08-20 22:00:49

解決方案1
0 2015-08-20 21:54:16

解決方案2
0 2015-08-20 22:00:49