简体   繁体   中英

Multi-threaded bisection search

Suppose that I have some computable predicate, P, that maps integers to bools. I know that P 0 is true and I know some N such that PN is false. I also know that P n = false implies that P (n + 1) is false [*]. I want to find the maximum n such that P n is true.

Clearly, I can find a solution to this by bisection. Unfortunately, evaluating P is expensive (maybe it takes an hour or so). I also have a shiny server with many cores.

If evaluating P took constant time and I had 2 threads (say), I can see how I would do the search. I'd divide the interval [a, b] into three segments and evaluate P (2a + b)/3 and P (a + 2b)/3 . Once both evaluations had completed, I'd know which of the three segments to recurse into. Using two threads, my search time would be reduced by a third. Excellent!

However, what if evaluating P takes wildly varying times? In my concrete example, it might take between 10 seconds and an hour or so. Suppose again that I have 2 threads and have divided the interval as above. Maybe the first thread (evaluating P (2a+b)/3 ) finishes first. How do I decide where to run the next one?

I guess there's some link to information theory or similar, because I'm trying to run the test that will give me the most information, given my current knowledge. But this seems like it should be a problem that others have investigated - can anyone point me at papers or similar?

[*] In the concrete case I care about, the test involves running an SMT solver trying to find a solution X to a constraint problem with one extra constraint of the form X ≥ n, where n is the integer above.

If you are looking for a paper reference, you might get more traction on CS.SE . Here I can only offer some heuristics.

Whenever a thread finishes, you can stop all other threads whose answer you now know (ie, if you got P(n)=T , you can stop all thread working on k<n , and if P(n)=F , you can stop all threads working on k>n ). Thus you now have 1 or more threads to start.

From the information-theoretic POV, dividing the existing intervals to minimize the maximum length of the new intervals is obviously optimal.

However, since you note in a comment:

the SMT solver takes much longer for satisfiable solutions

It might be faster to start with a large n and go down slowly (eg, if you know that P(100)=F and P(1)=T , test 95, 90, 80 in 3 threads, instead of 25, 50, 75 as recommended by the information theory).

You can also use the running time as an indicator of the probable result. Eg, start your 3 threads at n=25,50,75 . Suppose in 1 minute you learn P(75)=F but the other two are still running. Then you can puth the n=25 thread to sleep (to be awoken or killed in the future as necessary) and start two new threads for 60 and 70.

If there is no more knowledge about P evaluation time, like statistical properties, or some connection to parameter n, than probably it is the best strategy to use evaluation that is first finished and not wait for other started evaluations. That is since long lasting evaluation can last few hundred times more than fast one. Few fast evaluations can reduce search interval quite fast, so long lasting ones will not be needed to evaluate at all.

I would try strategy that is binary search that starts more evaluations on interval half. Eg if interval [0, 100] has to be checked and there are 8 threads than start evaluations for n=[47, ..., 54]. When first evaluation finish, kill evaluations that can't be result, pause other evaluations, and proceed with ~half of previous interval. When interval is slightly larger than number of threads (~1.5x) use some strategy to cover whole interval with evaluations, since to find result you have to check both neighbours. There will be no more than 2*num_threads paused evaluations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM