[英]Summing up a data frame according to a column in R
我正在嘗試針對每個城市的總和提供以下數據框:
> summary(dat1)
Date City Sales
Min. :2010-06-18 Min. : 1.00 Min. : 667.4
1st Qu.:2011-02-18 1st Qu.:18.00 1st Qu.: 1138.6
Median :2011-10-28 Median :37.00 Median : 1507.5
Mean :2011-10-29 Mean :44.26 Mean : 2065.4
3rd Qu.:2012-07-06 3rd Qu.:74.00 3rd Qu.: 2347.1
Max. :2013-03-08 Max. :99.00 Max. :47206.6
即我想找到具有相應的日期X城市觀察值的數據框,該數據框將顯示每個城市在每一天的銷售總額。
有幾種可能性。 僅舉幾例:
函數aggregate():
i) aggregate(Sales~Date+City, data=df, sum)
ii) aggregate(df$Sales, list(df$Date,df$City), sum)
函數tapply():
i) tapply(df$Sales, list(df$Date, df$City), sum)
如果您的數據集很大,則函數tapply()
尤其有用,因為聚合可能會阻塞非常大的數據集,但是tapply()
通常會更tapply()
處理這些數據。 另外, tapply()
和aggregate()
以不同的格式生成輸出,並且您可能希望選擇最適合進行進一步分析的輸出。
這些示例可以在下面顯示的模擬數據上進行測試:
df<-structure(list(Date = structure(c(4L, 2L, 4L, 2L, 3L, 4L, 3L,
2L, 2L, 2L, 2L, 4L, 1L, 4L, 2L, 4L, 2L, 3L, 4L, 2L, 3L, 3L, 4L,
3L, 4L, 2L, 2L, 2L, 3L, 1L, 1L, 4L, 2L, 4L, 1L, 2L, 1L, 2L, 3L,
2L, 2L, 3L, 2L, 1L, 1L, 3L, 2L, 1L, 1L, 3L, 3L, 1L, 3L, 1L, 1L,
1L, 3L, 2L, 3L, 1L, 3L, 3L, 2L, 2L, 4L, 2L, 1L, 3L, 3L, 1L, 4L,
1L, 2L, 2L, 1L, 2L, 2L, 2L), .Label = c("2014-01-01", "2014-02-01",
"2014-03-01", "2014-04-01"), class = "factor"), City = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 1L, 2L, 3L,
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L,
18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L), .Label = c("a",
"b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n",
"o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"), class = "factor"),
Sales = c(100, 100, 93, 92, 95, 115, 104, 106, 113, 94, 93,
98, 116, 85, 98, 97, 103, 110, 105, 104, 107, 86, 92, 94,
106, 115, 112, 92, 103, 100, 101, 97, 95, 110, 103, 92, 91,
98, 100, 93, 108, 87, 96, 101, 87, 111, 90, 94, 110, 95,
110, 101, 88, 99, 106, 117, 101, 120, 92, 86, 118, 104, 99,
89, 103, 102, 121, 99, 106, 99, 107, 105, 109, 110, 112,
94, 100, 112)), .Names = c("Date", "City", "Sales"), row.names = c(NA,
-78L), class = "data.frame")
請參閱aggregation
功能
aggregate(Sales~Date+City, data=dat1, sum)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.