[英]Grouping and checking for a condition in R
I have two tables: restaurant_trans
and restaurant_master
我有两个表: restaurant_trans
和restaurant_master
restaurant_trans
has name, date, net_sales restaurant_trans
具有名称,日期,net_sales
This is a transaction file with sales for 50 restaurant recorded for 30 days each (1500 obs). 这是一个交易文件,其中记录了50家餐厅的销售情况,每个餐厅记录了30天(1500磅)。
restaurant_master
has name, go.live.date, franchise restaurant_master
名称为go.live.date,专营权
This is a master file with name of the restaurant and 'go.live.date' is the date a particular device was installed in the restaurant. 这是一个带有餐厅名称的主文件,“ go.live.date”是餐厅中特定设备的安装日期。
I want to find the net sales of the restaurant before and after the device was installed. 我想查找安装该设备前后餐厅的净销售额。 I first want the data to be grouped. 我首先要对数据进行分组。
I tried this code for subsetting the data 我尝试使用此代码对数据进行分组
dummayvar = 0;
for (i in 1:nrow(restaurant_master)){
for (j in 1:nrow(restaurant_trans)){
if(restaurant_trans$Restaurant.Name[j]==restaurant_master$Restaurant.Name[i]){
if(restaurant_trans$Date[j] < restaurant_master$Go.Live.Date[i]){
append(dummayvar, restaurant_trans$Date)
}
}
}
}
This is giving an error : 这给出了一个错误:
"level sets of factors are different" “因素的水平集不同”
Please help!! 请帮忙!!
Consider a merge() instead of nested for
loops. 考虑使用merge()而不是嵌套的for
循环。 Simply merge restaurant netsales
and master
data frames by name and then subset data frames according to net sales' dates and master's go.live.dates. 只需按名称合并餐厅netsales
和master
数据框,然后根据净销售日期和主数据netsales
合并子数据框。 Finally, aggregate net sales by restaurant name and franchise or individually. 最后,按餐厅名称和特许经营权或单独汇总销售净额。
# DATA FRAME EXAMPLES
netsales <- data.frame(name=c('A', 'A', 'A', 'A', 'A',
'B', 'B', 'B', 'B', 'B',
'C', 'C', 'C', 'C', 'C'),
date=c('6/1/2015', '6/15/2015', '7/1/2015', '9/1/2015', '11/15/2015',
'6/5/2015', '6/20/2015', '7/15/2015', '8/1/2015', '10/15/2015',
'6/10/2015', '7/10/2015', '8/15/2015', '9/20/2015', '9/30/2015'),
net_sales=c(1500, 600, 1200, 850, 750,
1120, 560, 720, 340, 890,
1150, 410, 300, 250, 900))
netsales$date <- as.Date(strptime(netsales$date, '%m/%d/%Y'))
str(netsales)
master <- data.frame(name=c('A', 'B', 'C'),
go.live.date=c('7/25/2015', '8/1/2015', '7/1/2015'),
franchise=c('R Co.', 'Python, Inc.', 'C# Ltd.'))
master$go.live.date <- as.Date(strptime(master$go.live.date, '%m/%d/%Y'))
str(master)
# MERGE AND AGGREGATE BEFORE GO LIVE SALES
beforelive <- merge(netsales, master, by='name')
beforelive <- beforelive[beforelive$date < beforelive$go.live.date,]
beforelivesales <- aggregate(net_sales ~ name + franchise, beforelive, FUN=sum)
# MERGE AND AGGREGATE AFTER GO LIVE SALES
afterlive <- merge(netsales, master, by='name')
afterlive <- afterlive[afterlive$date >= afterlive$go.live.date,]
afterlivesales <- aggregate(net_sales ~ name + franchise, afterlive, FUN=sum)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.