简体   繁体   English

使用data.frame / list作为函数的参数进行映射

[英]Mapply with data.frame/list as the Arguments for the Function

In short, I have a larger function that creates data.frames that are subsets of a larger data.frame and are named after the arguments of the function. 简而言之,我有一个较大的函数,该函数创建data.frame,它们是较大data.frame的子集,并以该函数的参数命名。 It's building data.frames for raw data AND the outputs and the predictive output of Holt-Winters...meaning it is creating multiple data.frames. 它正在构建用于原始数据以及Holt-Winters的输出和预测输出的data.frames ...这意味着它正在创建多个data.frames。 A small example is the following (though there's not enough intervals here to actually generate a ts class data.frame): 下面是一个小示例(尽管这里没有足够的间隔来实际生成ts类data.frame):

Group <- c("Primary_Group","Primary_Group","Primary_Group","Primary_Group","Primary_Group","Primary_Group","Secondary_Group","Secondary_Group","Secondary_Group","Secondary_Group","Secondary_Group","Secondary_Group","Tertiary_Group","Tertiary_Group","Tertiary_Group","Tertiary_Group","Tertiary_Group","Tertiary_Group")
Day <- c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
Type <- c("A","A","A","B","B","B","A","A","A","B","B","B","A","A","A","B","B","B")
Value <- c(7,3,10,3,9,4,0,9,3,10,1,6,3,4,10,2,3,1)
df <- as.data.frame(cbind(Group,Day,Type,Value))

Fun <- function(Group,Type, A, B, G){
    df <- Data[Data$Group== Group & Data$Type== Type, ]
    assign(paste(Group,Type,"_df",sep = ''), df, envir = parent.frame()) 
    df_holtwinters <- HoltWinters(ts(Data[Data$Group== Group & Data$Type== Type, ], 
                                  frequency = 365), alpha = A, beta = B, gamma = G)
    assign(paste(Group,Type,"_hw",sep = ''), df_holtwinters, envir = parent.frame()) 
}

You'll notice that the Group and Type are characters, while A, B, G are either numeric or NULL . 您会注意到GroupType是字符,而A,B,G是数字或NULL

If I now have a data.frame composed of lists values, how could I best loop the above function (likely with mapply ) to use the values from each column in row one...then each column from row 2 etc - creating several data frames. 如果我现在有一个由列表值组成的data.frame,如何最好地循环上述函数(可能使用mapply )以使用第一行中每一列的值...然后使用第二行中的每一列,等等-创建多个数据框架。

argGroup <- c("Primary_Group","Primary_Group","Secondary_Group","Secondary_Group","Tertiary_Group","Tertiary_Group")
argType <- c("A","B","A","B","A","B")
argA <- c(NA, NA, NA, NA, NA, NA)
argB <- c(0.05, 0.05, NA, NA, NA, NULL)
argG <- c(NA, NA, NA, NA, NA, NA)

argGroup[is.na(argGroup)] <- list(NULL)
argType[is.na(argType)] <- list(NULL)
argA[is.na(argA)] <- list(NULL)
argB[is.na(argB)] <- list(NULL)
argG[is.na(argG)] <- list(NULL)

Arguments <- cbind(argType, argType, argA, argB, argG)

Ideally, I would get the following data.frames to generate... 理想情况下,我将获得以下data.frames来生成...

Primary_Group_A_df
Primary_Group_A_hw
Primary_Group_B_df
Primary_Group_B_hw
Secondary_Group_A_df
Secondary_Group_A_hw
Secondary_Group_B_df
Secondary_Group_B_hw
Tertiary_Group_A_df
Tertiary_Group_A_hw
Tertiary_Group_B_df
Tertiary_Group_B_hw

It would also be helpful to understand how to best (most automated way) rbind all the _df together and all the _hw together. 这也将有助于了解如何最佳(最自动化的方式) rbind共同所有的_DF和所有的_hw在一起。

Any help would be amazing and very appreciated. 任何帮助将是惊人的,非常感谢。 Thanks so much! 非常感谢!

You're losing type information by using as.data.frame(cbind(...)) , just use data.frame directly: 您将通过使用as.data.frame(cbind(...))丢失类型信息,只需直接使用data.frame即可:

Data <- data.frame(
  Group = rep(c("Primary_Group", "Secondary_Group", "Tertiary_Group"), each = 6L),
  Day = rep(1L:3L, 6L),
  Type = rep(rep(c("A", "B"), each = 3L), 3L),
  Value = c(7,3,10,3,9,4,0,9,3,10,1,6,3,4,10,2,3,1)
)

Afterwards, I presume you can do the following: 之后,我想您可以执行以下操作:

split_data <- split(Data, as.list(Data[, c("Group", "Type")]))
dfs <- do.call(rbind, split_data)

dfs_hw <- lapply(split_data, function(sub_data) {
  Map(argA, argB, argG, f = function(A, B, G) {
    HoltWinters(ts(sub_data, frequency = 365), alpha = A, beta = B, gamma = G)
  })
})

dfs_hw <- do.call(rbind, unlist(dfs_hw, recursive = FALSE))

But I get an error from HoltWinters , so I can't say for sure. 但是我从HoltWinters收到一个错误,所以我不能肯定地说。 Also, I think dfs simply has Data again, just reordered. 另外,我认为dfs只是再次具有Data ,只是重新排序。

Avoid flooding your global environment with many similarly structured objects. 避免用许多类似结构的对象充斥您的全局环境。 Consider using a container such as a list to hold the many dataframes. 考虑使用诸如列表之类的容器来保存许多数据帧。 One useful method is by to subset your dataframe by one or more factor(s) such as Group and Type to return a list of dataframes. 一种有用的方法是by一个或多个因素(例如“ 组”和“ 类型” )对数据框进行子集化,以返回数据框列表。 Also, don't iterate by rows but merge arguments with data for one pass of arguments per subset. 另外,不要按行进行迭代,而是merge参数与数据merge ,以便每个子集传递一次参数。

Specifically, call by twice for df and hw lists. 具体来说,呼吁by两次DF硬件列表。 But first, merge the df and Arguments data frames by Group and Type . 但首先,按GroupType合并dfArguments数据帧。 One challenge is NULL cannot be stored in a data frame, so consider saving "NULL" string and assign temp variables to pass into the HW arguments. 一个挑战是NULL无法存储在数据帧中,因此请考虑保存"NULL"字符串并分配临时变量以传递到HW参数中。 Unfortunately, this will cast entire column as character type which you will need to convert with as.numeric for non-NULL values. 不幸的是,这会将整个列转换为字符类型,对于非NULL值,您需要将其转换为as.numeric

Merge 合并

Group <- c("Primary_Group","Primary_Group","Secondary_Group","Secondary_Group",
           "Tertiary_Group","Tertiary_Group")
Type <- c("A","B","A","B","A","B")
argA <- c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL")
argB <- c(0.05, 0.05, "NULL", "NULL", "NULL", "NULL")
argG <- c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL")

Arguments <- data.frame(Group, Type, argA, argB, argG, stringsAsFactors=FALSE)
df <- merge(df, Arguments, by=c("Group", "Type"))

Dataframe List (with named df elements) 数据框列表 (具有命名的df元素)

# ORDER FOR NAMING LATER
df <- with(df, df[order(Type, Group),])

# DATAFRAME LIST
df_list <- by(df, df[c("Group", "Type")], identity)
# RENAME LIST
df_list <- setNames(df_list, unique(paste0(df$Group, "_", df$Type, "_df")))

# REFERENCE ELEMENTS
df_list$Primary_Group_A_df
df_list$Secondary_Group_A_df
df_list$Tertiary_Group_A_df
...

HW List (with named hw elements) 硬件列表 (带有命名的硬件元素)

# HW LIST
hw_list <- by(df, df[c("Group", "Type")], function(sub) {
  # CONDITIONALLY ASSIGN TEMP VARIABLES 
  # (BEING SUBSETS: max(arg*)==min(arg*)==mean(arg*)==median(arg*))
  if(!is.na(max(sub$argA)) & max(sub$argA) == "NULL") { tmpA <- NULL } 
  else { tmpA <- max(as.numeric(sub$argA)) }

  if(!is.na(max(sub$argB)) & max(sub$argB) == "NULL") { tmpB <- NULL } 
  else { tmpB <- max(as.numeric(sub$argB)) }

  if(!is.na(max(sub$argG)) & max(sub$argG) == "NULL") { tmpG <- NULL } 
  else { tmpG <- max(as.numeric(sub$argG)) }

  # PASS ARGS ONCE PER SUBSET 
  return(HoltWinters(ts(sub, frequency = 365), alpha=tmpA, beta=tmpB, gamma=tmpG))
})

# RENAME LIST
hw_list <- setNames(hw_list, unique(paste0(df$Group, "_", df$Type, "_hw")))

# REFERENCE ELEMENTS
hw_list$Primary_Group_A_hw
hw_list$Secondary_Group_A_hw
hw_list$Tertiary_Group_A_hw
...

Output (using 3 for HW's frequency to align with posted data) 输出 (使用3作为硬件频率以与发布的数据对齐)

> hw_list$Primary_Group_A_hw
Holt-Winters exponential smoothing with trend and additive seasonal component.

Call:
HoltWinters(x = ts(sub[c("Group", "Day", "Type", "Value")], frequency = 3),     alpha = tmpA, beta = tmpB, gamma = tmpG)

Smoothing parameters:
 alpha: 0.2169231
 beta : 0.05
 gamma: 0.1

Coefficients:
          [,1]
a   2.89129621
b   0.08783715
s1  0.54815382
s2 -0.12485260
s3  0.21087038

> hw_list$Secondary_Group_A_hw
Holt-Winters exponential smoothing with trend and additive seasonal component.

Call:
HoltWinters(x = ts(sub[c("Group", "Day", "Type", "Value")], frequency = 3),     alpha = tmpA, beta = tmpB, gamma = tmpG)

Smoothing parameters:
 alpha: 0.752124
 beta : 0
 gamma: 0

Coefficients:
            [,1]
a   3.691664e+00
b   3.333333e-01
s1  3.333333e-01
s2 -1.480388e-16
s3 -3.333333e-01

> hw_list$Tertiary_Group_A_hw
Holt-Winters exponential smoothing with trend and additive seasonal component.

Call:
HoltWinters(x = ts(sub[c("Group", "Day", "Type", "Value")], frequency = 3),     alpha = tmpA, beta = tmpB, gamma = tmpG)

Smoothing parameters:
 alpha: 0.3145406
 beta : 0
 gamma: 0

Coefficients:
            [,1]
a   3.022946e+00
b  -3.333333e-01
s1 -3.333333e-01
s2 -1.480388e-16
s3  3.333333e-01

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM