簡體   English   中英

R:遍歷數據框,提取多個變量的子集,然后存儲在聚合數據集中

[英]R: Loop through data frame, extract subset of multiple variables, then store in an aggregate dataset

我有大約6000萬行的匯總數據表。 簡化后,數據如下所示:

ServiceN  Customer  Product  LValue  EDate  CovBDate  CovEDate
1   1   12  3    2016-08-03 2016-07-07 2017-07-06
2   1   12  19   2016-07-07 2016-07-07 2017-07-06
3   2   23  222  2017-09-09 2016-10-01 2017-09-31
4   2   23  100  2017-10-01 2017-10-01 2018-09-31

我需要遍歷每一行,並按客戶將整個數據集子集化,並在CovBDate和CovEDate之間輸入所有日期(EDate)。 然后,我需要找到每個產品的LValue的總和(我們只看10,所以並不可怕)。

例如,最終的數據集將如下所示:

ServiceN  Customer  Product  LValue  EDate  CovBDate  CovEDate Prod12 Prod23
1   1   12  3    2016-08-03 2016-07-07 2017-07-06  22  0
2   1   12  19   2016-07-07 2016-07-07 2017-07-06  22  0
3   2   23  222  2017-09-09 2016-10-01 2017-09-31  0   222
4   2   23  100  2017-10-01 2017-10-01 2018-09-31  0   100

我不知道從哪里開始這個問題,但是,我已經開始了(這不起作用):

for (i in 1:length(nrow)) {
  tempdata<-dataset[Customer==Customer[i] & EDate>=CovBDate[i] & 
  EDate<=CovEDate[i]] #data.table subsetting
  tempdata$Prod12<- with(tempdata, sum(LValue[Product== "12"], na.rm=T))
  #I could make this a function, but I want to get this for loop automated first...
  tempdata$Prod23<- with(tempdata, sum(LValue[Product=="23"], na.rm=T))
}

因此,我的問題是:
1)如何使for循環使用這么多變量?
2)如何使新變量添加到原始數據集(稱為數據集)?

使用dplyr可以執行以下操作:

library(dplyr)

dataset <- data.frame(ServiceN = c("1", "2", "3", "4"),
    Customer = c("1", "1", "2", "2"),
    Product = c("12", "12", "23", "23"),
    LValue = c(3, 19, 222, 100),
    EDate  = c("2016-08-03", "2016-07-07", "2017-09-09", "2017-10-01"),
    CovBDate = c("2016-07-07", "2016-07-07", "2016-10-01", "2017-10-01"),
    CovEDate = c("2017-07-06", "2017-07-06", "2017-09-31", "2018-09-31"),
    stringsAsFactors = FALSE)

## Group by customer and product so summary results are per-customer/product combination
dataset %>% group_by(Customer, Product) %>%
    ## Filter based on dates
    filter(EDate >= CovBDate & EDate <= CovEDate) %>%
    ## Sum the LValue based on the defined groupings
    summarise(Sum = sum(LValue))


## A tibble: 2 x 3
## Groups:   Customer [?]
# Customer Product   Sum
#<chr>    <chr>   <dbl>
#1 1        12         22
#2 2        23        322

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM