R比较两列上的两个数据帧，并增加第三个

Question

I have the following problem. 我有以下问题。 On one dataframe, I have daily observations of customers. 在一个数据帧上，我每天都有客户的观察。 On another I have the purchases they made. 另外，我有他们购买的东西。 I am interested in how many items they purchased so far on any given day. 我对他们到目前为止在任何一天购买了多少物品感兴趣。 I solved this problem with a for loop but was wondering if there was a more efficient way? 我用for循环解决了这个问题，但想知道是否有更有效的方法？

Let us see in an example: 让我们看一个例子：

# Same customer observed on 10 different occasions
customers<-data.frame(id=rep(1:10, 10), date=rep(11:20, each=10))
purchases<-data.frame(id=c(1,1,4,6,6,6), date=c(12, 14, 12, 9, 13, 17))

# I can achieve what I want if I add a cumulative sum column and run a for loop
purchases$count<-sapply(1:length(purchases$id), function(i) sum(purchases$id[i]==purchases$id[1:i]))

customers$count<-0
for(i in 1:nrow(purchases)){
     customers[(customers$id==purchases[i, "id"] & customers$date>=purchases[i, "date"]),"count"]<-purchases[i,"count"]
}

customers
    id date count
1    1   11     0
2    2   11     0
3    3   11     0
4    4   11     0
5    5   11     0
6    6   11     1
7    7   11     0
8    8   11     0
9    9   11     0
10  10   11     0
11   1   12     1
12   2   12     0
13   3   12     0
14   4   12     1
 .   .    .     .
 .   .    .     .
100  10   20    0

I was wondering what would be the faster way to do this? 我想知道什么是更快的方法？

Thanks in advance. 提前致谢。

Answer 1

here's a base R solution -- but packages such as dplyr and data.table are also useful for this: 这是基本的R解决方案-但dplyr和data.table类的软件包也对此有用：

# make sure purchases table is ordered correctly to do cumulative count
cum_purchases <- cbind(purchases <- purchases[order(purchases$id, purchases$date),],
                       count = with(purchases, ave(id==id, id, FUN=cumsum)) )
cum_purchases
#   id date count
# 1  1   11             1
# 2  1   14             2
# 3  4   12             1
# 4  6    9             1
# 5  6   13             2
# 6  6   17             3
out <- merge(customers,cum_purchases, all.x=T) # "left join"
out
# note that this solution produces NA instead of 0 for no purchases
# you can change this with:
out$count[is.na(out$count)] <- 0
out[order(out$date,out$id),] # produces a correct version of the example output

R gives you lots of ways to count things. R提供了许多计算事物的方法。 (Edited to use cumulative count.) （已编辑以使用累积计数。）

R比较两列上的两个数据帧，并增加第三个

问题描述

1 个解决方案

解决方案1
1 2015-07-04 00:00:42

R比较两列上的两个数据帧，并增加第三个

问题描述

1 个解决方案

解决方案1 1 2015-07-04 00:00:42

解决方案1
1 2015-07-04 00:00:42