如何使用data.table按日期（月，年，日）和子组聚合结果

Question

Using R version 3.1.3 I'm attempting to count of events in event log data. 使用R版本3.1.3我试图计算事件日志数据中的事件。

I have a data set of timstamped events. 我有一个timstamped事件的数据集。 I've cleaned the data, and have it loaded into a data.table for easier manipulation. 我已经清理了数据，并将其加载到data.table中以便于操作。

Colnames are OrderDate, EventDate, OrderID, EventTypeID, LocationID and EncounterID, Colnames是OrderDate，EventDate，OrderID，EventTypeID，LocationID和EncounterID，

These events are aggregated as: EncounterID's have multiple orderID, each orderID has multiple eventID 这些事件聚合为：EncounterID具有多个orderID，每个orderID具有多个eventID

Examples of data would be: 数据示例如下：

library(data.table) 
DT <- fread("OrderDate,EventDate,OrderID,EventTypeID,LocationID,EncounterID 
1/12/2012 5:40,01/12/2012 05:40,100001,12344,1,5998887
1/12/2012 5:40,01/12/2012 05:49,100001,12345,1,5998887
1/12/2012 5:40,01/12/2012 06:40,100001,12345,1,5998887
1/12/2012 5:45,01/12/2012 05:45,100002,12344,1,5998887
1/12/2012 5:45,01/12/2012 05:49,100002,12345,1,5998887
1/12/2012 5:45,01/12/2012 06:40,100002,12345,1,5998887
1/12/2012 5:46,01/12/2012 05:46,100003,12344,2,5948887
1/12/2012 5:46,01/12/2012 05:49,100003,12345,2,5948887
1/12/2013 7:40,01/12/2013 07:40,123001,12345,2,6008887
1/12/2013 7:40,01/12/2013 07:41,123001,12346,2,6008887
1/12/2013 7:40,01/12/2013 07:50,123001,12345,2,6008887
1/12/2013 7:40,01/12/2013 07:55,123001,12345,2,6008887")


DT$OrderDate <- as.POSIXct(DT$OrderDate, format="%d/%m/%Y %H:%M")
DT$EventDate <- as.POSIXct(DT$EventDate, format="%d/%m/%Y %H:%M")

My ultimate goal is to explore this data visually using ggplot2, looking at the count of various combinations by month... but I'm having trouble aggregating the data using data.table's 我的最终目标是使用ggplot2直观地探索这些数据，按月查看各种组合的数量...但是我在使用data.table的聚合数据时遇到了问题。

My specific question (one example) How can I generate a table of of the following: Month-Year, LocationID, Count_of_Orders 我的具体问题（一个示例）如何生成以下表格：Month-Year，LocationID，Count_of_Orders

If I do the following: 如果我执行以下操作：

DT[,.N,by=.(month(OrderDate),year(OrderDate))]

I get a count of all the eventID's, but I need the Count of OrderID's per month per locationID. 我得到了所有eventID的计数，但我需要每个locationID每月的OrderID。

   month year N
1:    12 2012 8
2:    12 2013 4

BUT - what I'm looking for is results of N by Month-year by LocationID: 但是 - 我正在寻找的是按位置ID划分的N by Month-year的结果：

Month-Year,LocationID,Count_of_orders
01-12,1,2
01-12,2,1
01-13,1,0
01-13,2,1

NOTE: Notice, that for any location that doesn't have orders in a month, they should be listed with count zero. 注意：请注意，对于任何一个月内没有订单的地点，它们应列在计数零处。 The locations would therefore need to be determined by generating a list of unique locationIDs. 因此，需要通过生成唯一locationID列表来确定位置。

Can someone please provide solutions? 有人可以提供解决方案吗？

Thanks 谢谢

Answer 1

I'm assuming your date/times are in POSIXct format (since you call month / year ). 我假设你的日期/时间是POSIXct格式（因为你打电话给month / year ）。 Then, 然后，

d[, month.year := format(OrderDate, '%m-%y')]

setkey(d, month.year, LocationID, OrderID)

unique(d)[CJ(unique(month.year), unique(LocationID)), .N, by = .EACHI]
#   month.year LocationID N
#1:      01-12          1 2
#2:      01-12          2 1
#3:      01-13          1 0
#4:      01-13          2 1

I used the fact that unique by default will pick unique entries by the key, and would also preserve the key, so I can do the next join easily. 我使用了这样一个事实：默认情况下， unique将通过键选择唯一的条目，并且还会保留键，因此我可以轻松地进行下一次连接。

如何使用data.table按日期（月，年，日）和子组聚合结果

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-03-31 16:35:49

如何使用data.table按日期（月，年，日）和子组聚合结果

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-03-31 16:35:49

解决方案1
2 已采纳 2015-03-31 16:35:49