简体   繁体   中英

Parallel Normal Distributions

I'm working on a simulation where a large task is completed by a series of independent smaller tasks either in parallel or in series. The smaller task's time of completion follows a normal distribution with a mean time say "t" and a variance say "v". I understand that if this task is repeated in series say "n" times than the new total time distribution is normal with mean t*n and variance v*n, which is nice but I don't know what happens to the mean and variance if a set of the same tasks are done simultaneously/in parallel, it's been a while since prob stat class. Is there a nice/fast way to find the new time distribution for "n" of these independent normally distributed task done in parallel?

If the tasks are undertaken independently and in parallel, the distribution of time until completion depends on the time of the longest process.

Unfortunately, the max function doesn't have particularly nice properties for theoretical analysis, but if you're already simulating there's an easy way to do it. For each subprocess i with mean t_i and variance v_i, draw time until completion for each i independently then look at the biggest. Repeating this lots of times will give you a bunch of samples from the max distribution you're interested in: you can compute the expectation (average), variance, or whatever you want.

The question is, what is the distribution of the maximum (greatest value) of the random completion times. The distribution function (ie the indefinite integral of the probability density) of the maximum of a collection of independent random variables is just the product of the distribution function of each variable. (The distribution function of the minimum is just 1 - (product of (1 - distribution function)).)

If you want to find a time such that probability(maximum > time) = (some given value), you might be able to solve that exactly, or resort to a numerical method. Still, solving the equation numerically (eg bisection method) is much faster and more accurate than a Monte Carlo method, as you mentioned you have already tried.

This isn't exactly a programming problem, but what you're looking for are the distributions of order statistics of normal random variables, ie, the expected value/variance/etc of the job that took the longest, shortest, etc. This is a solved problem for identical means and variances, because you can scale all the random variables to the standard normal distribution, which has been analyzed.

Here's the paper that gives you the answer, though you're going to need some math knowledge to understand it:

Algorithm AS 177: Expected Normal Order Statistics (Exact and Approximate) JP Royston. Journal of the Royal Statistical Society. Series C (Applied Statistics) Vol. 31, No. 2 (1982), pp. 161-165

See this post on stats.stackexchange for more information.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM