I am trying to find the most efficient way to repeatedly search for combinations of two variables in a reference table. The problem is based on an implementation of a hill climbing algorithm with annealing step size and, as such, adds a lot of complexity to the problem.
To explain, say I have two variables A
and B
that I want to optimise. I start with 100 combinations of these variables that I will iterate through
set.seed(100)
A_start <- sample(1000,10,rep=F)
B_start <- sample(1000,10,rep=F)
A_B_starts<-expand.grid(A = A_start,
B = B_start)
head(A_B_starts)
A B
1 714 823
2 503 823
3 358 823
4 624 823
5 985 823
6 718 823
For each of these start combinations, I want to use their immediate neighbours in a predictive model and if their error is less than that of the start combination, continue in that direction. This is repeated until a max number of iterations is hit or the error increases (standard hill climbing). I, however, do not want to recheck combinations I have already looked at so, to do this I want to use a reference table to store checked combinations. I then check each time I generate the immediate neighbours if they are in the reference table before running the predictive model. Any that are present are simply removed. More complexity is added because I want the step size that generates the immediate neighbours to be annealing; get smaller over time. I have implemented this using data.tables
max_iterations <-1e+06
#Set max size so efficient to add new combinations, max size is 100 start points by max iterations allowed
ref <-data.table(A=numeric(),
B=numeric(),
key=c("A","B"))[1:(100*max_iterations)]
ref
A B
1: NA NA
2: NA NA
3: NA NA
4: NA NA
5: NA NA
---
99999996: NA NA
99999997: NA NA
99999998: NA NA
99999999: NA NA
100000000: NA NA
So the loop to actually go through the problem
step_A <- 5
step_B <- 5
visited_counter <- 1L
for(start_i in 1:nrow(A_B_starts)){
initial_error <- get.error.pred.model(A_B_starts[1,])
A <-A_B_starts[1,1]
B <-A_B_starts[1,2]
#Add start i to checked combinations
set(ref, i=visited_counter, j="A", value=A)
set(ref, i=visited_counter, j="B", value=B)
visited_counter <- visited_counter+1L
iterations <- 1
while(iterations<max_iterations){
#Anneal step
decay_A = step_A / iterations
decay_B = step_B / iterations
step_A <- step_A * 1/(1+decay_A*iterations)
step_B <- step_B * 1/(1+decay_B*iterations)
#Get neighbours to check
to_visit_A <- c(A+step_A,A-step_A)
to_visit_B <- c(B+step_B,B-step_B)
to_visit <- setDT(expand.grid("A"=to_visit_A,"B" = to_visit_B),
key=c("A","B"))
#Now check if any combination have been checked before and remove if so
#set key for efficient searching - need to reset in loop as you are updating values in datatable
setkey(ref,A,B)
prev_visited<-ref[to_visit,nomatch=0L]
to_visit<-to_visit[!prev_visited]
#Run model on remaining combinations and if error reducing continue
best_neighbour <- get.min.error.pred.model(to_visit)
if(best_neighbour$error<initial_error){
initial_error <- best_neighbour_error
A <- best_neighbour$A
B <- best_neighbour$B
}
else{
iterations <- max_iterations
}
#Add all checked to reference and also update the number of iterations
for(visit_i in 1L:nrow(to_visit)){
#This will reset the key of the data table
set(ref, i=visited_counter, j="A", value=to_visit[visit_i,1])
set(ref, i=visited_counter, j="B", value=to_visit[visit_i,2])
visited_neighbour_counter <- visited_counter+1L
iterations <- iterations+1
}
}
}
The problem with this approach is that I have to reset the key each loop iteration as a new combination has been added to ref
which makes it very slow:
setkey(ref,A,B)
prev_visited<-ref[to_visit,nomatch=0L]
to_visit<-to_visit[!prev_visited]
Also, the reason I mention the annealing is because I had another idea to use a sparse matrix; Matrix
to hold indicators of pairs already checked which would allow very quick checks
require(Matrix)
#Use a sparse matrix for efficient search and optimum RAM usage
sparse_matrix <- sparseMatrix(A = 1:(100*1e+06),
B = 1:(100*1e+06) )
However, since the step size is variable ie A/B can hold any value with increasingly small intervals, I don't know how to initialise appropriate values of A and B in the sparse matrix to capture all possible combinations checked?
(Not really an answer, but too long for a comment.)
If the number of possible solutions is huge, it might be impractical or impossible to store them all. What is more, the fastest way to look up a single solution is generally a hashtable; but setting up the hashtable is slow, so you might not gain much (your objective function needs to be more expensive that this set-up/look-up-oberhead). Depending on the problem, much of this storing solutions might be a waste; the algorithm may never revisit them. An alternative suggestion might be a first-in/first-out data structure, which simply stores the last N solutions that have been visited. (Even a linear look-up may be faster than working with a repeatedly-setup hash table for a short list.) But in any case, I'd start with some testing whether and how often the algorithm actually revisits a particular solution.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.