[英]Using R aggregate() function, how do I count distinct values?
我的数据如下所示:
str(defects)
## 'data.frame': 22540 obs. of 8 variables:
## $ BUG_ID : int 2237 2239 2163 2163 2163 2163 2163 2163 2163 2163 ...
## $ STATUS : Factor w/ 5 levels "Assigned","Closed",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ SEVERITY : Factor w/ 4 levels "1-Low","2-Medium",..: 4 3 3 3 3 3 3 3 3 3 ...
## $ DETECTION_DATE : Date, format: "2017-10-31" "2017-10-31" ...
## $ ACTUAL_FIX_TIME: int 1 1 20 20 20 20 20 20 20 20 ...
## $ CLOSING_DATE : Date, format: "2017-10-31" "2017-10-31" ...
## $ DATE : Date, format: "2017-10-31" "2017-10-31" ...
## $ NOR : int 1 1 1 1 1 1 1 1 1 1 ...
我需要使用聚合 function 计算以下内容:
COUNT_DISTINCT(
IF [CLOSING_DATE] == [DATE] THEN
[BUG_ID]
END
)
这是我所拥有的:
aggregate(unique(BUG_ID) ~ DATE, defects, subset = CLOSING_DATE == DATE, length)
我想到了:
aggregate(CLOSED_DEFECTS ~ DATE, data = within(defects, CLOSED_DEFECTS <- ifelse(CLOSING_DATE == DATE, BUG_ID, NA)), function (x) if(length(x) > 0) length(unique(x)) - 1 else 0, na.action = na.pass)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.