Mapply with data.frame/list as the Arguments for the Function

Question

In short, I have a larger function that creates data.frames that are subsets of a larger data.frame and are named after the arguments of the function. It's building data.frames for raw data AND the outputs and the predictive output of Holt-Winters...meaning it is creating multiple data.frames. A small example is the following (though there's not enough intervals here to actually generate a ts class data.frame):

Group <- c("Primary_Group","Primary_Group","Primary_Group","Primary_Group","Primary_Group","Primary_Group","Secondary_Group","Secondary_Group","Secondary_Group","Secondary_Group","Secondary_Group","Secondary_Group","Tertiary_Group","Tertiary_Group","Tertiary_Group","Tertiary_Group","Tertiary_Group","Tertiary_Group")
Day <- c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
Type <- c("A","A","A","B","B","B","A","A","A","B","B","B","A","A","A","B","B","B")
Value <- c(7,3,10,3,9,4,0,9,3,10,1,6,3,4,10,2,3,1)
df <- as.data.frame(cbind(Group,Day,Type,Value))

Fun <- function(Group,Type, A, B, G){
    df <- Data[Data$Group== Group & Data$Type== Type, ]
    assign(paste(Group,Type,"_df",sep = ''), df, envir = parent.frame()) 
    df_holtwinters <- HoltWinters(ts(Data[Data$Group== Group & Data$Type== Type, ], 
                                  frequency = 365), alpha = A, beta = B, gamma = G)
    assign(paste(Group,Type,"_hw",sep = ''), df_holtwinters, envir = parent.frame()) 
}

You'll notice that the Group and Type are characters, while A, B, G are either numeric or NULL .

If I now have a data.frame composed of lists values, how could I best loop the above function (likely with mapply ) to use the values from each column in row one...then each column from row 2 etc - creating several data frames.

argGroup <- c("Primary_Group","Primary_Group","Secondary_Group","Secondary_Group","Tertiary_Group","Tertiary_Group")
argType <- c("A","B","A","B","A","B")
argA <- c(NA, NA, NA, NA, NA, NA)
argB <- c(0.05, 0.05, NA, NA, NA, NULL)
argG <- c(NA, NA, NA, NA, NA, NA)

argGroup[is.na(argGroup)] <- list(NULL)
argType[is.na(argType)] <- list(NULL)
argA[is.na(argA)] <- list(NULL)
argB[is.na(argB)] <- list(NULL)
argG[is.na(argG)] <- list(NULL)

Arguments <- cbind(argType, argType, argA, argB, argG)

Ideally, I would get the following data.frames to generate...

Primary_Group_A_df
Primary_Group_A_hw
Primary_Group_B_df
Primary_Group_B_hw
Secondary_Group_A_df
Secondary_Group_A_hw
Secondary_Group_B_df
Secondary_Group_B_hw
Tertiary_Group_A_df
Tertiary_Group_A_hw
Tertiary_Group_B_df
Tertiary_Group_B_hw

It would also be helpful to understand how to best (most automated way) rbind all the _df together and all the _hw together.

Any help would be amazing and very appreciated. Thanks so much!

Answer 1

You're losing type information by using as.data.frame(cbind(...)) , just use data.frame directly:

Data <- data.frame(
  Group = rep(c("Primary_Group", "Secondary_Group", "Tertiary_Group"), each = 6L),
  Day = rep(1L:3L, 6L),
  Type = rep(rep(c("A", "B"), each = 3L), 3L),
  Value = c(7,3,10,3,9,4,0,9,3,10,1,6,3,4,10,2,3,1)
)

Afterwards, I presume you can do the following:

split_data <- split(Data, as.list(Data[, c("Group", "Type")]))
dfs <- do.call(rbind, split_data)

dfs_hw <- lapply(split_data, function(sub_data) {
  Map(argA, argB, argG, f = function(A, B, G) {
    HoltWinters(ts(sub_data, frequency = 365), alpha = A, beta = B, gamma = G)
  })
})

dfs_hw <- do.call(rbind, unlist(dfs_hw, recursive = FALSE))

But I get an error from HoltWinters , so I can't say for sure. Also, I think dfs simply has Data again, just reordered.

Answer 2

Avoid flooding your global environment with many similarly structured objects. Consider using a container such as a list to hold the many dataframes. One useful method is by to subset your dataframe by one or more factor(s) such as Group and Type to return a list of dataframes. Also, don't iterate by rows but merge arguments with data for one pass of arguments per subset.

Specifically, call by twice for df and hw lists. But first, merge the df and Arguments data frames by Group and Type . One challenge is NULL cannot be stored in a data frame, so consider saving "NULL" string and assign temp variables to pass into the HW arguments. Unfortunately, this will cast entire column as character type which you will need to convert with as.numeric for non-NULL values.

Merge

Group <- c("Primary_Group","Primary_Group","Secondary_Group","Secondary_Group",
           "Tertiary_Group","Tertiary_Group")
Type <- c("A","B","A","B","A","B")
argA <- c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL")
argB <- c(0.05, 0.05, "NULL", "NULL", "NULL", "NULL")
argG <- c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL")

Arguments <- data.frame(Group, Type, argA, argB, argG, stringsAsFactors=FALSE)
df <- merge(df, Arguments, by=c("Group", "Type"))

Dataframe List (with named df elements)

# ORDER FOR NAMING LATER
df <- with(df, df[order(Type, Group),])

# DATAFRAME LIST
df_list <- by(df, df[c("Group", "Type")], identity)
# RENAME LIST
df_list <- setNames(df_list, unique(paste0(df$Group, "_", df$Type, "_df")))

# REFERENCE ELEMENTS
df_list$Primary_Group_A_df
df_list$Secondary_Group_A_df
df_list$Tertiary_Group_A_df
...

HW List (with named hw elements)

# HW LIST
hw_list <- by(df, df[c("Group", "Type")], function(sub) {
  # CONDITIONALLY ASSIGN TEMP VARIABLES 
  # (BEING SUBSETS: max(arg*)==min(arg*)==mean(arg*)==median(arg*))
  if(!is.na(max(sub$argA)) & max(sub$argA) == "NULL") { tmpA <- NULL } 
  else { tmpA <- max(as.numeric(sub$argA)) }

  if(!is.na(max(sub$argB)) & max(sub$argB) == "NULL") { tmpB <- NULL } 
  else { tmpB <- max(as.numeric(sub$argB)) }

  if(!is.na(max(sub$argG)) & max(sub$argG) == "NULL") { tmpG <- NULL } 
  else { tmpG <- max(as.numeric(sub$argG)) }

  # PASS ARGS ONCE PER SUBSET 
  return(HoltWinters(ts(sub, frequency = 365), alpha=tmpA, beta=tmpB, gamma=tmpG))
})

# RENAME LIST
hw_list <- setNames(hw_list, unique(paste0(df$Group, "_", df$Type, "_hw")))

# REFERENCE ELEMENTS
hw_list$Primary_Group_A_hw
hw_list$Secondary_Group_A_hw
hw_list$Tertiary_Group_A_hw
...

Output (using 3 for HW's frequency to align with posted data)

> hw_list$Primary_Group_A_hw
Holt-Winters exponential smoothing with trend and additive seasonal component.

Call:
HoltWinters(x = ts(sub[c("Group", "Day", "Type", "Value")], frequency = 3),     alpha = tmpA, beta = tmpB, gamma = tmpG)

Smoothing parameters:
 alpha: 0.2169231
 beta : 0.05
 gamma: 0.1

Coefficients:
          [,1]
a   2.89129621
b   0.08783715
s1  0.54815382
s2 -0.12485260
s3  0.21087038

> hw_list$Secondary_Group_A_hw
Holt-Winters exponential smoothing with trend and additive seasonal component.

Call:
HoltWinters(x = ts(sub[c("Group", "Day", "Type", "Value")], frequency = 3),     alpha = tmpA, beta = tmpB, gamma = tmpG)

Smoothing parameters:
 alpha: 0.752124
 beta : 0
 gamma: 0

Coefficients:
            [,1]
a   3.691664e+00
b   3.333333e-01
s1  3.333333e-01
s2 -1.480388e-16
s3 -3.333333e-01

> hw_list$Tertiary_Group_A_hw
Holt-Winters exponential smoothing with trend and additive seasonal component.

Call:
HoltWinters(x = ts(sub[c("Group", "Day", "Type", "Value")], frequency = 3),     alpha = tmpA, beta = tmpB, gamma = tmpG)

Smoothing parameters:
 alpha: 0.3145406
 beta : 0
 gamma: 0

Coefficients:
            [,1]
a   3.022946e+00
b  -3.333333e-01
s1 -3.333333e-01
s2 -1.480388e-16
s3  3.333333e-01

Mapply with data.frame/list as the Arguments for the Function

Question

2 answers

solution1
0 2018-06-11 20:33:33

solution2
0 ACCPTED 2018-06-13 19:09:18

Mapply with data.frame/list as the Arguments for the Function

Question

2 answers

solution1 0 2018-06-11 20:33:33

solution2 0 ACCPTED 2018-06-13 19:09:18

solution1
0 2018-06-11 20:33:33

solution2
0 ACCPTED 2018-06-13 19:09:18