[英]R compare two dataframes on two columns and increment a third
我有以下問題。 在一個數據幀上,我每天都有客戶的觀察。 另外,我有他們購買的東西。 我對他們到目前為止在任何一天購買了多少物品感興趣。 我用for循環解決了這個問題,但想知道是否有更有效的方法?
讓我們看一個例子:
# Same customer observed on 10 different occasions
customers<-data.frame(id=rep(1:10, 10), date=rep(11:20, each=10))
purchases<-data.frame(id=c(1,1,4,6,6,6), date=c(12, 14, 12, 9, 13, 17))
# I can achieve what I want if I add a cumulative sum column and run a for loop
purchases$count<-sapply(1:length(purchases$id), function(i) sum(purchases$id[i]==purchases$id[1:i]))
customers$count<-0
for(i in 1:nrow(purchases)){
customers[(customers$id==purchases[i, "id"] & customers$date>=purchases[i, "date"]),"count"]<-purchases[i,"count"]
}
customers
id date count
1 1 11 0
2 2 11 0
3 3 11 0
4 4 11 0
5 5 11 0
6 6 11 1
7 7 11 0
8 8 11 0
9 9 11 0
10 10 11 0
11 1 12 1
12 2 12 0
13 3 12 0
14 4 12 1
. . . .
. . . .
100 10 20 0
我想知道什么是更快的方法?
提前致謝。
這是基本的R解決方案-但dplyr
和data.table
類的軟件包也對此有用:
# make sure purchases table is ordered correctly to do cumulative count
cum_purchases <- cbind(purchases <- purchases[order(purchases$id, purchases$date),],
count = with(purchases, ave(id==id, id, FUN=cumsum)) )
cum_purchases
# id date count
# 1 1 11 1
# 2 1 14 2
# 3 4 12 1
# 4 6 9 1
# 5 6 13 2
# 6 6 17 3
out <- merge(customers,cum_purchases, all.x=T) # "left join"
out
# note that this solution produces NA instead of 0 for no purchases
# you can change this with:
out$count[is.na(out$count)] <- 0
out[order(out$date,out$id),] # produces a correct version of the example output
R提供了許多計算事物的方法。 (已編輯以使用累積計數。)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.