Say I have a data frame of customers -
cust_df =
Date ArrivalTime TimeInStore AmountSpent
170920 930 30 20
170920 1000 20 20
170920 1001 30 100
170920 1500 15 10
170921 1030 10 200
170921 1111 25 50
170921 1900 10 75
I want to do 2 different actions: 1. Check how much time and money the 3 first customers on each day spend 2. Compare that to random 3 customers from each day (they can be within the first three or not) If during that day there were less than 3 customers, I want to include all customers from that day.
What is the most efficient way to do so?
Currently my code is:
cust_df <- cust_df[order(cust_df$Date, cust_df$ArrivalTime),] #order by time
cust_df_by_Date <- split(cust_df,f = cust_df$Date) #split to dates
cust_num <- sapply(cust_df_by_Date,function(x) dim(x)[1]) #find num of customers per day
first_cust_df <- c()
i <- 1
for(num in cust_num ){
if(num>=3){
first_cust_df <- rbind(first_cust_df,cust_df_by_Date[[i]][1:3,])
}
else{
first_cust_df <- rbind(first_cust_df,cust_df_by_Date[[i]][1:num,])
}
i <- i+1
}
And for the random part:
rand_cust_sampling_df <- ldply(cust_df_by_Date,function(x) x[sample(1:dim(x)[1],ifelse(dim(x)[1]>=3,3,dim(x)[1])),])
I'm quite sure that there is a more efficient way to do so, but I'm new to this language and couldn't find an answer to this specific question.
Thanks!
The dplyr
package can help you here.
install.packages("dplyr")
library(dplyr)
To get the first 3 customers on a day, group_by
Date then slice
:
cust_df %>%
group_by(Date) %>%
slice(1:3)
Not clear from your question how you want to summarise time and spending but you could sum, for example, like this:
cust_df %>%
group_by(Date) %>%
slice(1:3) %>%
summarise(sumSpent = sum(AmountSpent))
Date sumSpent
<int> <int>
1 170920 140
2 170921 325
You can randomly select 3 customers by date using sample_n
:
cust_df %>%
group_by(Date) %>%
sample_n(3) %>%
summarise(sumSpent = sum(AmountSpent))
Date sumSpent
<int> <int>
1 170920 130
2 170921 325
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.