Optimization in R: cost function with binary scheduling variables?

Question

The below details a simplified version of an optimization problem that I am having trouble solving.

The objective is to minimize a cost function for an organization that delivers water via trucks and use that equation to produce a truck delivery schedule that minimizes costs.

The organization delivers water to ~10,000 household tanks throughout the year.

The tanks have a maximum capacity of 300 gallons and minimum desired limit of 100 gallons -- that is, the tanks should be topped up to 300 before they go below 100.

For example, if the tank is 115 gallons on week 2 and is estimated to use 20 gallons in week 3, it needs to be refilled in week 3.

The costs include:

A per-delivery fee of $10
The weekly cost of trucks. The weekly cost of a truck is $1,000. So if 200 deliveries are in a single week the cost is $3,000 (200 * 10 + 1000 * 1) .If 201 deliveries are made, the cost jumps significantly to $4,010 (201 * 10 + 1000 * 2) .

Water usage varies across households and across weeks. Peak water usage is in the summer. If we blindly followed the rule to refill just before hitting the 100 gallon minimum limit, then it is likely that the peak number of trucks would be higher than needed if the deliveries were spread out into the "shoulders" of the summer.

I have created estimates of weekly water usage for each week for each household. Furthermore, I have grouped like households to reduce the size of the optimization problem (~10k households down to 8 groups).

To restate the goal: The output of this optimizer should be: deliver or not, for each household group, for each of the 52 weeks in a year.

Simplified data (ie, for 8 groups and 12 weeks):

df.usage <-  structure(list(reduction.group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 
                                                1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 
                                                3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 
                                                5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 
                                                7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 
                                                8, 8, 8), week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 
                                                                   2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
                                                                   10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 
                                                                   5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
                                                                   12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 
                                                                   7, 8, 9, 10, 11, 12), water_usage = c(46, 50, 42, 47, 43, 39, 
                                                                                                         38, 32, 42, 36, 42, 30, 46, 50, 42, 47, 43, 39, 38, 32, 42, 36, 
                                                                                                         42, 30, 46, 50, 43, 47, 43, 39, 38, 32, 42, 36, 42, 30, 46, 50, 
                                                                                                         43, 47, 43, 39, 38, 32, 42, 36, 42, 30, 29, 32, 27, 30, 27, 25, 
                                                                                                         24, 20, 26, 23, 27, 19, 29, 32, 27, 30, 27, 25, 24, 20, 26, 23, 
                                                                                                         27, 19, 29, 32, 27, 30, 28, 25, 25, 21, 27, 23, 27, 19, 29, 32, 
                                                                                                         27, 30, 28, 25, 25, 21, 27, 23, 27, 20), tank.level.start = c(115, 
                                                                                                                                                                        NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 165, NA, NA, NA, 
                                                                                                                                                                        NA, NA, NA, NA, NA, NA, NA, NA, 200, NA, NA, NA, NA, NA, NA, 
                                                                                                                                                                        NA, NA, NA, NA, NA, 215, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
                                                                                                                                                                        NA, NA, 225, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 230, 
                                                                                                                                                                        NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 235, NA, NA, NA, 
                                                                                                                                                                        NA, NA, NA, NA, NA, NA, NA, NA, 240, NA, NA, NA, NA, NA, NA, 
                                                                                                                                                                        NA, NA, NA, NA, NA)), row.names = c(NA, 96L), class = "data.frame")

Tank level refill rules

Here is a nested set of loops for determining tank levels over time with a "refill" logic:

library(dplyr)

reduction.groups <- unique(df.usage$reduction.group)
df.after.refill.logic <- list()

for (i in reduction.groups) {

  temp <- df.usage %>% filter(reduction.group == i)
  temp$refilled <- 0
  temp$level <- temp$tank.level.start

  n <- nrow(temp)

  if (n > 1) for (j in 2:n) {
    temp$level[j] <- ( temp$level[j-1] - temp$water_usage[j] )
    if(temp$level[j] < 100) {
      temp$level[j] <- 300
      temp$refilled[j] <- 1
    }
  }
  df.after.refill.logic <- bind_rows(df.after.refill.logic, temp)
}

Decision Variables

Delivery or not to each group, each week of the year (Binary)

Constraints

No partial trucks: number of trucks must be integers

Truck capacity: truck deliveries/week <= 200

Tanks can't go below 100 gallons: level >= 100

Delivery must be binary

Constants

1600 # truck_weekly_costs
10 # cost_per_delivery
200 # weekly_delivery_capacity_per_truck

Example Cost Function

weekly_cost_function <- function(i){
  cost <- (ceiling(sum(i)/200)) * 1600 + (sum(i) * 10)
  cost
}

**example cost for one week with i = 199 deliveries:**
weekly_cost_function(i = 199)
[1] 3590

Attempt to Model the Problem using OMPR

Below is the beginning of a model created with the OMPR package (although using another package would be okay):

I am confused about how to set this up using the data above. Three obvious problems:

How can I include the ceiling logic expressed in the Example Cost Function in the OMPR code?
The model below isn't incorporating the data in the dataframe above (df.usage). The goal is for an optimizer to generate values for the "refilled" and "level" variables based on the four variables (reduction.group, week, water_usage, tank_level_start), along with the constants.
The refill logic I wrote in the "determining tank levels" loop above isn't incorporated. Should that be added as a constraint? If so, how?

num_groups <- length(unique(df.usage$reduction.group))
num_weeks <- length(unique(df.usage$week))

MIPModel() %>%
  add_variable(x[i,w],                         # create decision variable: deliver or not by...
               i = 1:num_groups,               # group,
               w = 1:num_weeks,                # in week.
               type = "integer",               # Integers only
               lb = 0, ub = 1) %>%             # between 0 and 1, inclusive 
  set_objective(sum_expr( x[i,w]/200 * 1600 + x[i,w] * 10,
                          i = 1:num_groups, 
                          w = 1:num_weeks),
                sense = "min") %>%
  # add constraint to achieve ceiling(x[i,w]/200), or should this be in the set_objective call?
  add_constraint(???) %>%
  solve_model(with_ROI("glpk"))

Desired Output

Here is what an example head() output would look like:


 reduction.group   week   water.usage  refill   level
               1      1            46       0     115
               1      2            50       1     300
               1      3            42       0     258
               1      4            47       0     211
               1      5            43       0     168
               1      6            39       0     129

Importantly, the refill values would be whatever minimizes the cost function and keeps the level above 100.

Answer 1

The ceiling function is a difficult non-linear function (non-differentiable, not continuous), and should be avoided at all cost. However it can be modeled quite easily with general integer variables. For non-negative variables x >= 0 we can formulate

y = ceiling(x)

as

x <= y <= x+1
y integer

This is completely linear and is trivial to implement in OMPR (or in any other LP/MIP tool).

Detail note. This formulation will allow the model to choose y=x or y=x+1 in the special case where x assumes an integer value. If you want to be picky about this case, you can do:

x+0.0001 <= y <= x+1
y integer

I would not worry about this.

Answer 2

With the ceiling function, this seems like a difficult problem for a hill-climbing optimizer. I think genetic algorithms are a better fit. The matrix of deliver-or-not for each house each week makes a nice genome.

library(dplyr)

# Original given sample input data.
df.usage <-  structure(list(reduction.group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 
                                                1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 
                                                3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 
                                                5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 
                                                7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 
                                                8, 8, 8), week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 
                                                                   2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
                                                                   10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 
                                                                   5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
                                                                   12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 
                                                                   7, 8, 9, 10, 11, 12), water_usage = c(46, 50, 42, 47, 43, 39, 
                                                                                                         38, 32, 42, 36, 42, 30, 46, 50, 42, 47, 43, 39, 38, 32, 42, 36, 
                                                                                                         42, 30, 46, 50, 43, 47, 43, 39, 38, 32, 42, 36, 42, 30, 46, 50, 
                                                                                                         43, 47, 43, 39, 38, 32, 42, 36, 42, 30, 29, 32, 27, 30, 27, 25, 
                                                                                                         24, 20, 26, 23, 27, 19, 29, 32, 27, 30, 27, 25, 24, 20, 26, 23, 
                                                                                                         27, 19, 29, 32, 27, 30, 28, 25, 25, 21, 27, 23, 27, 19, 29, 32, 
                                                                                                         27, 30, 28, 25, 25, 21, 27, 23, 27, 20), tank.level.start = c(115, 
                                                                                                                                                                       NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 165, NA, NA, NA, 
                                                                                                                                                                       NA, NA, NA, NA, NA, NA, NA, NA, 200, NA, NA, NA, NA, NA, NA, 
                                                                                                                                                                       NA, NA, NA, NA, NA, 215, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
                                                                                                                                                                       NA, NA, 225, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 230, 
                                                                                                                                                                       NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 235, NA, NA, NA, 
                                                                                                                                                                       NA, NA, NA, NA, NA, NA, NA, NA, 240, NA, NA, NA, NA, NA, NA, 
                                                                                                                                                                       NA, NA, NA, NA, NA)), row.names = c(NA, 96L), class = "data.frame")

# Orginal given delivery cost function.
weekly_cost_function <- function(i){
  cost <- (ceiling(sum(i)/200)) * 1600 + (sum(i) * 10)
  cost
}

# Calculate the list of houses (reduction.groups) and number of delivery weeks (weeks).
reduction.groups <- unique(df.usage$reduction.group)
temp             <- df.usage %>% filter(reduction.group == 1)
weeks            <- nrow(temp)

# The genome consists of a matrix representing deliver-or-not to each house each week.
create_random_delivery_schedule <- function(number_of_houses, number_of_weeks, prob = NULL) {
  matrix(sample(c(0, 1), number_of_houses * number_of_weeks, replace = TRUE, prob = prob), number_of_houses)
}

# Generate a population of random genes.
population_size <- 100
schedules <- replicate(population_size, create_random_delivery_schedule(length(reduction.groups), weeks), simplify = FALSE)

# Calculate fitness of an individual.
fitness <- function(schedule) {

  # Fitness is related to delivery cost.
  delivery_cost <- sum(apply(schedule, 2, weekly_cost_function))

  # If the schedule allows a tank level to drop below 100, apply a fitness penalty.
  # Don't make the fitness penalty too large.
  # If the fitness penalty is large enough to be catastrophic (essentially zero children)
  # then solutions that are close to optimal will also be likely to generate children
  # who fall off the catastropy cliff so there will be a selective pressure away from
  # close to optimal solutions.
  # However, if your optimizer generates a lot of infeasible solutions raise the penalty.
  for (i in reduction.groups) {

    temp <- df.usage %>% filter(reduction.group == i)
    temp$level <- temp$tank.level.start

    if (weeks > 1) for (j in 2:weeks) {
      if (1 == schedule[i,j]) {
        temp$level[j] <- 300
      } else {
        temp$level[j] <- ( temp$level[j-1] - temp$water_usage[j] )

        if (100 > temp$level[j]) {
          # Fitness penalty.
          delivery_cost <- delivery_cost + 10 * (100 - temp$level[j])
        }
      }
    }
  }

  # Return one over delivery cost so that lower cost is higher fitness.
  1 / delivery_cost
}

# Generate a new schedule by combining two parents chosen randomly weighted by fitness.
make_baby <- function(population_fitness) {

  # Choose some parents.
  parents <- sample(length(schedules), 2, prob = population_fitness)

  # Get DNA from mommy.
  baby <- schedules[[parents[1]]]

  # Figure out what part of the DNA to get from daddy.
  house_range <- sort(sample(length(reduction.groups), 2))
  week_range  <- sort(sample(weeks, 2))

  # Get DNA from daddy.
  baby[house_range[1]:house_range[2],week_range[1]:week_range[2]] <- schedules[[parents[2]]][house_range[1]:house_range[2],week_range[1]:week_range[2]]

  # Mutate, 1% chance of flipping each bit.
  changes <- create_random_delivery_schedule(length(reduction.groups), weeks, c(0.99, 0.01))
  baby <- apply(xor(baby, changes), c(1, 2), as.integer)
}

lowest_cost <<- Inf

# Loop creating and evaluating generations.
for (ii in 1:100) {
  population_fitness <- lapply(schedules, fitness)
  lowest_cost_this_generation <- 1 / max(unlist(population_fitness))
  print(sprintf("lowest cost = %f", lowest_cost_this_generation))

  if (lowest_cost_this_generation < lowest_cost) {
    lowest_cost <<- lowest_cost_this_generation
    best_baby   <<- schedules[[which.max(unlist(population_fitness))]]
  }

  schedules <<- replicate(population_size, make_baby(population_fitness), simplify = FALSE)
}

Optimization in R: cost function with binary scheduling variables?

Question

2 answers

solution1
4 2019-04-30 01:21:08

solution2
2 ACCPTED 2019-05-08 20:06:31

Optimization in R: cost function with binary scheduling variables?

Question

2 answers

solution1 4 2019-04-30 01:21:08

solution2 2 ACCPTED 2019-05-08 20:06:31

solution1
4 2019-04-30 01:21:08

solution2
2 ACCPTED 2019-05-08 20:06:31