简体   繁体   中英

Random element in fitness function genetic algorithm

So I am using a genetic algorithm to train a feedforward neural network, tasked with recognizing a function given to the genetic algorithm. Ie x = x**2 or something more complicated obviously.

I realized I am using random inputs in my fitness function, which causes the fitness to be somewhat random for a member of the population, however, still in line with how close it is to the given function obviously. A colleague remarked that it is stranged that the same member of the population doesnt always get the same fitness, which I agree is a little unconventional. However, it got me thinking, is there any reason why this would be bad for the genetic algorithm? I actually think it might be quite good because it enables me to have a rather small testset, speeding up number of generations while still avoiding overfitting to any given testest.

Does anyone have experience with this?

(fitness function is MSE compared to given function, for a randomly generated testset of 10 iterations)

A consistent fitness value is necessary for efficient progression of your evolutionary algorithm. Imagine the extreme case: if fitness evaluation for your candidates is always 100% random, then your algorithm will perform random search (which is not efficient).

If your fitness evaluation is not consistent, it usually means you have not successfully abstracted the meaning of "value" in your problem (and this is sometimes hard!) or it may be the result of random factors (more similar to what I understand from your description). These are often countered by averaging.

If, in your case, those random inputs are truly and advantage, consider having some averaging, which might make fitness evaluation more consistent, even though slower.

But, in short, slow evaluations are not good (you are right about that) and neither are inconsistent fitness values. In the end, feel free to find your own balance.

Edit based on the comments:

Imagine the task where an artificial neural network (ANN) has to reproduce a function, for example f(x) = x (where the ANN only has one input, x, and one output, f(x), but perhaps many hidden units needed for more complex cases).

We could imagine testing for fitness using always a set of points, eg, test f(x) for x = {0.2, 0.4, 0.6, 0.8}. The closer f(x) is to the expected f(x) = x in each case, the higher the fitness. This will be consistent, but may result in overfitting, as the image shows:

过度拟合示例

The solution is very good close to the test points, but unpredictable elsewhere. The search algorithm will likely be efficient because of the consistent evaluation, but results may not be good.

An alternative approach is to use a random set of test points every iteration, eg, x = {0.13, 0.19, 0.56, 0.99}. Because the test points are different every time, the result must be good everywhere. The downside is inconsistent evaluation, as shown in the image:

评估不一致

The same candidate solution seems good with the test set A, and quite bad for the test set B. Under this conditions the search algorithm may be more inefficient, but the solution will be better in the range of values we want.

Depending on our specific case we can improve things by having more iterations, by having larger test sets (better average) or by trying intermediate solutions. For example, one might consider testing always for three random points, where the first is always between 0 and 1/3, the second between 1/3 and 2/3, and the third between 2/3 and 1. The possibilities are really endless and the better choice will depend on each problem.

Note that many tasks will not have this problem at all. For example, in the classic XOR we only need to test for {X1 = 0, X2 = 0; X1 = 1, X2 = 0; X1 = 0, X2 = 1; X1 = 1, X2 = 1}. Of course it will be fast to test all four cases!

Normally you use a seed for genetic algorithms, which should be fixed. It will always generate the same "random" childs sequentially, which makes your approach reproducible. So the genetic algorithm is kind of pseudo-random. That is state of art how to perform genetic algorithms.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM