[英]How to summarize date data by groups in R
我想将以下示例数据总结为一个新的数据框,如下所示:
人口,样本量(N),完成百分比(%)
样本数量是每个人口的所有记录的计数。 我可以使用table命令或轻按来执行此操作。 完成百分比是带有“结束日期”的记录的百分比(假定所有没有“结束日期”的记录都没有完成。这是我迷路的地方!
样本数据
sample <- structure(list(Population = structure(c(1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 3L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L), .Label = c("Glommen",
"Kaseberga", "Steninge"), class = "factor"), Start_Date = structure(c(16032,
16032, 16032, 16032, 16032, 16036, 16036, 16036, 16037, 16038,
16038, 16039, 16039, 16039, 16039, 16039, 16039, 16041, 16041,
16041, 16041, 16041, 16041, 16044, 16044, 16045, 16045, 16045,
16045, 16048, 16048, 16048, 16048, 16048, 16048), class = "Date"),
End_Date = structure(c(NA, 16037, NA, NA, 16036, 16043, 16040,
16041, 16042, 16042, 16042, 16043, 16043, 16043, 16043, 16043,
16043, 16045, 16045, 16045, 16045, 16045, NA, 16048, 16048,
16049, 16049, NA, NA, 16052, 16052, 16052, 16052, 16052,
16052), class = "Date")), .Names = c("Population", "Start_Date",
"End_Date"), row.names = c(NA, 35L), class = "data.frame")
您可以使用split / apply / combine来做到这一点:
spl = split(sample, sample$Population)
new.rows = lapply(spl, function(x) data.frame(Population=x$Population[1],
SampleSize=nrow(x),
PctComplete=sum(!is.na(x$End_Date))/nrow(x)))
combined = do.call(rbind, new.rows)
combined
# Population SampleSize PctComplete
# Glommen Glommen 13 0.6923077
# Kaseberga Kaseberga 7 1.0000000
# Steninge Steninge 15 0.8666667
一句话警告: sample
是基本函数的名称,因此您应该为数据框选择一个不同的名称。
使用plyr
软件包很容易:
library(plyr)
ddply(sample, .(Population), summarize,
Sample_Size = length(End_Date),
Percent_Completed = mean(!is.na(End_Date)) * 100)
# Population Sample_Size Percent_Completed
# 1 Glommen 13 69.23077
# 2 Kaseberga 7 100.00000
# 3 Steninge 15 86.66667
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.