更改計划的開始日期以優化資源

Question

我有一堆工作需要在特定的時間間隔執行。 但是，我們每天都有足夠的資源來完成這項工作。 因此，我正在嘗試優化開始時間日期（開始時間日期只能向前移動而不是向后移動），這樣每天使用的資源與我們預算的資源更不相似。

這些函數在下面的示例中使用::

# Function to shift/rotate a vector
shifter <- function(x, n = 1) {
  if (n == 0) x else c(tail(x, -n), head(x, n))
}

# Getting a range of dates
get_date_range <- function(current_date = Sys.Date(), next_planned_date = Sys.Date() + 5)
{
  seq.Date(as.Date(current_date), as.Date(next_planned_date), "days")
}

假設玩具示例數據集::此處任務P1在14日開始，而P2從15日開始。 值零表示當天沒有為該任務完成任務。

# EXAMPLE TOY DATASET 
datain = data.frame(dated = c("2018-12-14", "2018-12-15", "2018-12-16", "2018-12-17"), 
                    P1 = c(1,2,0,3), P2 = c(0,4,0,6)) %>%
  mutate(dated = as.character(dated)) 

#The amount of resources that can be  used in a day
max_work = 4

# We will use all the possible combination of start dates to 
# search for the best one
possible_start_dates <- do.call(expand.grid, date_range_of_all)

# Utilisation stores the capacity used during each 
# combination of start dates
# We will use the minimum of thse utilisation
utilisation <- NULL # utilisation difference; absolute value
utilisation_act <-  NULL # actual utilisation including negative utilisation

# copy of data for making changes
ndatain <- datain
# Move data across possible start dates and 
# calculate the possible utilisation in each movements
for(i in 1:nrow(possible_start_dates)) # for every combination
{
  for(j in 1:ncol(possible_start_dates)) # for every plan
  {
    # Number of days that are different
    days_diff = difftime(oriz_start_date[["Plan_Start_Date"]][j], 
                         possible_start_dates[i,j], tz = "UTC", units = "days" ) %>% as.numeric()
    # Move the start dates
    ndatain[, (j+1)] <- shifter(datain[, (j+1)], days_diff)
  }
  if(is.null(utilisation)) # first iteration
  {
    # calculate the utilisation
    utilisation = c(i, abs(max_work - rowSums(ndatain %>% select(-dated))))
    utilisation_act <- c(i, max_work - rowSums(ndatain %>% select(-dated)))
  }else{ # everything except first iteration
    utilisation = rbind(utilisation, c(i,abs(max_work - rowSums(ndatain %>% select(-dated)))))
    utilisation_act <- rbind(utilisation_act, c(i, max_work - rowSums(ndatain %>% select(-dated))))

  }
}

# convert matrix to dataframe 
row.names(utilisation) <-  paste0("Row", 1:nrow(utilisation))
utilisation <- as.data.frame(utilisation)

row.names(utilisation_act) <-  paste0("Row", 1:nrow(utilisation_act))
utilisation_act <- as.data.frame(utilisation_act)

# Total utilisation
tot_util = rowSums(utilisation[-1])

# replace negative utilisation with zero
utilisation_act[utilisation_act < 0]  <- 0
tot_util_act = rowSums(utilisation_act[-1])

# Index of all possible start dates producing minimum utilization changes
indx_min_all = which(min(tot_util) == tot_util)
indx_min_all_act = which(min(tot_util_act) == tot_util_act)

# The minimum possible dates that are minimum of actual utilisation
candidate_dates <- possible_start_dates[intersect(indx_min_all, indx_min_all_act), ]

# Now check which of them are closest to the current starting dates; so that the movement is not much
time_diff <- c()
for(i in 1:nrow(candidate_dates))
{
  # we will add this value in inner loop so here we 
  timediff_indv <- 0
  for(j in 1:ncol(candidate_dates))
  {
    diff_days <- difftime(oriz_start_date[["Plan_Start_Date"]][j], 
                          candidate_dates[i,j], tz = "UTC", units = "days" ) %>% as.numeric()
    # print(oriz_start_date[["Plan_Start_Date"]][j])
    # print(candidate_dates[i,j])
    # 
    # print(diff_days)

    timediff_indv <- timediff_indv + diff_days
  }
  time_diff <- c(time_diff, timediff_indv)
}


# Alternatives
fin_dates  <-  candidate_dates[min(time_diff) == time_diff, ]

上面的代碼運行良好並產生預期的輸出; 但它不能很好地擴展。 我有非常大的數據集（兩年的工作量和超過一千個不同的任務間隔重復），搜索每個可能的組合不是一個可行的選擇。 有沒有辦法可以將此問題表述為標准優化問題，並使用Rglpk或Rcplex或更好的解決方案。 感謝您的投入。

Answer 1

這是我有史以來最長的StackOverflow答案，但我真的很喜歡優化問題。 這是使用單個機器的所謂的作業車間問題的變體，如果您首先將其表示為LP模型 ，您可以使用Rcplex解決這個問題。 然而，這些配方通常表現不佳 ，計算時間可能呈指數增長，這取決於配方。 對於大問題，使用啟發式算法是很常見的，例如遺傳算法 ，這是我經常在這樣的情況下使用的。 它不能保證提供最佳解決方案，但它使我們能夠更好地控制性能與運行時間，並且解決方案通常可以很好地擴展。 基本上，它的工作原理是創建一大組隨機解決方案，稱為人口。 然后我們通過組合解決方案來迭代地更新這個群體來制作“后代”，其中更好的解決方案應該具有更高的創建后代的可能性。

作為評分函數 （以確定哪些解決方案更好'），我使用了每天產能過剩的平方和，這會對一天產能過剩產生很大的影響。 請注意，您可以使用任何您想要的評分功能，因此如果您認為重要，您還可以懲罰容量利用不足。

示例實現的代碼如下所示。 我生成了200天和80任務的一些數據。 它在我的筆記本電腦上運行大約10秒鍾，將隨機解決方案的得分從2634到913 ，提高了65% 。 輸入700天和1000任務，算法仍然在幾分鍾內運行相同的參數。

每次迭代的最佳解決方案得分：

我還包括use_your_own_sample_data ，你可以將其設置為TRUE ，讓算法解決一個更簡單，更小的例子來確認它給出了預期的輸出：

     dated P1 P2 P3 P4 P5                dated P1 P2 P3 P4 P5
2018-12-14  0  0  0  0  0           2018-12-14  0  0  3  1  0
2018-12-15  0  0  0  0  0           2018-12-15  0  3  0  0  1
2018-12-16  0  0  0  0  0   ---->   2018-12-16  0  0  3  1  0
2018-12-17  0  3  3  1  1           2018-12-17  0  3  0  0  1
2018-12-18  4  0  0  0  0           2018-12-18  4  0  0  0  0
2018-12-19  4  3  3  1  1           2018-12-19  4  0  0  0  0

我希望這有幫助！ 如果您對此有更多疑問，請與我們聯系。

碼

### PARAMETERS -------------------------------------------

n_population = 100 # the number of solutions in a population
n_iterations = 100 # The number of iterations
n_offspring_per_iter = 80 # number of offspring to create per iteration
max_shift_days = 20 # Maximum number of days we can shift a task forward
frac_perm_init = 0.25 # fraction of columns to change from default solution while creating initial solutions
early_stopping_rounds = 100 # Stop if score not improved for this amount of iterations
capacity_per_day = 4

use_your_own_sample_data = FALSE # set to TRUE to use your own test case


### SAMPLE DATA -------------------------------------------------
# datain should contain the following columns:
# dated: A column with sequential dates
# P1, P2, ...: columns with values for workload of task x per date

n_days = 200
n_tasks = 80

set.seed(1)
if(!use_your_own_sample_data)
{
  # my sample data:
  datain = data.frame(dated = seq(Sys.Date()-n_days,Sys.Date(),1))
  # add some random tasks
  for(i in 1:n_tasks)
  {
    datain[[paste0('P',i)]] = rep(0,nrow(datain))
    rand_start = sample(seq(1,nrow(datain)-5),1)
    datain[[paste0('P',i)]][seq(rand_start,rand_start+4)] = sample(0:5,5,replace = T)
  }
}  else 
{
  # your sample data:
  library(dplyr)
  datain = data.frame(dated = c("2018-12-14", "2018-12-15", "2018-12-16", "2018-12-17","2018-12-18","2018-12-19"), 
                      P1 = c(0,0,0,0,4,4), P2 = c(0,0,0,3,0,3), P3=c(0,0,0,3,0,3), P4=c(0,0,0,1,0,1),P5=c(0,0,0,1,0,1)) %>%
    mutate(dated = as.Date(dated,format='%Y-%m-%d')) 
}
tasks = setdiff(colnames(datain),c("dated","capacity")) # a list of all tasks
# the following vector contains for each task its maximum start date
max_date_per_task = lapply(datain[,tasks],function(x) datain$dated[which(x>0)[1]])



### ALL OUR PREDEFINED FUNCTIONS ----------------------------------

# helper function to shift a task
shifter <- function(x, n = 1) {
  if (n == 0) x else c(tail(x, n), head(x, -n))
}

# Score a solution
# We calculate the score by taking the sum of the squares of our overcapacity (so we punish very large overcapacity on a day)
score_solution <- function(solution,tasks,capacity_per_day)
{
  cap_left = capacity_per_day-rowSums(solution[,tasks]) # calculate spare capacity
  over_capacity = sum(cap_left[cap_left<0]^2) # sum of squares of overcapacity (negatives)
  return(over_capacity)
}

# Merge solutions
# Get approx. 50% of tasks from solution1, and the remaining tasks from solution 2.
merge_solutions <- function(solution1,solution2,tasks)
{
  tasks_from_solution_1 = sample(tasks,round(length(tasks)/2))
  tasks_from_solution_2 = setdiff(tasks,tasks_from_solution_1)
  new_solution = cbind(solution1[,'dated',drop=F],solution1[,tasks_from_solution_1,drop=F],solution2[,tasks_from_solution_2,drop=F])
  return(new_solution)
}

# Randomize solution
# Create an initial solution
randomize_solution <- function(solution,max_date_per_task,tasks,tasks_to_change=1/8)
{
  # select some tasks to reschedule
  tasks_to_change = max(1, round(length(tasks)*tasks_to_change))
  selected_tasks <- sample(tasks,tasks_to_change)
  for(task in selected_tasks)
  {
    # shift task between 14 and 0 days forward
    new_start_date <- sample(seq(max_date_per_task[[task]]-max_shift_days,max_date_per_task[[task]],by='day'),1)
    new_start_date <- max(new_start_date,min(solution$dated))
    solution[,task] = shifter(solution[,task],as.numeric(new_start_date-max_date_per_task[[task]]))
  }
  return(solution)
}

# sort population based on scores
sort_pop <- function(population)
{
  return(population[order(sapply(population,function(x) {x[['score']]}),decreasing = F)])
}

# return the scores of a population
pop_scores <- function(population)
{
  sapply(population,function(x) {x[['score']]})
}



### RUN SCRIPT -------------------------------

# starting score
print(paste0('Starting score: ',score_solution(datain,tasks,capacity_per_day)))

# Create initial population
population = vector('list',n_population)
for(i in 1:n_population)
{
  # create initial solutions by making changes to the initial solution 
  solution = randomize_solution(datain,max_date_per_task,tasks,frac_perm_init)
  score = score_solution(solution,tasks,capacity_per_day)
  population[[i]] = list('solution' = solution,'score'= score)
}

population = sort_pop(population)

score_per_iteration <- score_solution(datain,tasks,capacity_per_day)
# Run the algorithm
for(i in 1:n_iterations)
{
  print(paste0('\n---- Iteration',i,' -----\n'))

  # create some random perturbations in the population
  for(j in 1:10)
  {
    sol_to_change = sample(2:n_population,1)
    new_solution <- randomize_solution(population[[sol_to_change]][['solution']],max_date_per_task,tasks)
    new_score <- score_solution(new_solution,tasks,capacity_per_day)
    population[[sol_to_change]] <- list('solution' = new_solution,'score'= new_score)
  }

  # Create offspring, first determine which solutions to combine
  # determine the probability that a solution will be selected to create offspring (some smoothing)
  probs = sapply(population,function(x) {x[['score']]})
  if(max(probs)==min(probs)){stop('No diversity in population left')}
  probs = 1-(probs-min(probs))/(max(probs)-min(probs))+0.2
  # create combinations
  solutions_to_combine = lapply(1:n_offspring_per_iter, function(y){
    sample(seq(length(population)),2,prob = probs)})
  for(j in 1:n_offspring_per_iter)
  {
    new_solution <- merge_solutions(population[[solutions_to_combine[[j]][1]]][['solution']],
                                    population[[solutions_to_combine[[j]][2]]][['solution']],
                                    tasks)
    new_score <- score_solution(new_solution,tasks,capacity_per_day)
    population[[length(population)+1]] <- list('solution' = new_solution,'score'= new_score)
  }
  population = sort_pop(population)
  population= population[1:n_population]
  print(paste0('Best score:',population[[1]]['score']))
  score_per_iteration = c(score_per_iteration,population[[1]]['score'])
  if(i>early_stopping_rounds+1)
  {
    if(score_per_iteration[[i]] == score_per_iteration[[i-10]])
    {
      stop(paste0("Score not improved in the past ",early_stopping_rounds," rounds. Halting algorithm."))
    }
  }
}

plot(x=seq(0,length(score_per_iteration)-1),y=score_per_iteration,xlab = 'iteration',ylab='score')
final_solution = population[[1]][['solution']]
final_solution[,c('dated',tasks)]

事實上，正如我們所期望的那樣，該算法在減少產能過高的天數方面非常出色：

final_solution = population[[1]][['solution']]

# number of days with workload higher than 10 in initial solution
sum(rowSums(datain[,tasks])>10)
> 19

# number of days with workload higher than 10 in our solution
sum(rowSums(final_solution[,tasks])>10)
> 1

更改計划的開始日期以優化資源

問題描述

1 個解決方案

解決方案1
5 已采納 2018-01-13 16:14:58

碼

更改計划的開始日期以優化資源

問題描述

1 個解決方案

解決方案1 5 已采納 2018-01-13 16:14:58

碼

解決方案1
5 已采納 2018-01-13 16:14:58