[英]Need help designing fitness evaluation for a NEAT algorithm-based neural network
I am working on a neural network based on the NEAT algorithm that learns to play an Atari Breakout clone in Python 2.7, and I have all of the pieces working, but I think the evolution could be greatly improved with a better algorithm for calculating species fitness. 我正在研究一个基于NEAT算法的神经网络,该算法学习如何在Python 2.7中播放Atari Breakout克隆,并且所有部分都起作用,但是我认为,使用更好的算法来计算物种适应度可以大大改善进化。
The inputs to the neural network are: 神经网络的输入是:
The outputs are: 输出为:
The parameters I have available to the species fitness calculation are: 我可用于物种适应度计算的参数是:
breakout_model.score
- int
: the final score of the game played by the species breakout_model.score
- int
:该物种玩游戏的最终分数 breakout_model.num_times_hit_paddle
- int
: the number of times the paddle hit the ball breakout_model.num_times_hit_paddle
- int
:球拍击球的次数 breakout_model.hits_per_life
- int
: the number of times the paddle hit the ball per life , in the form of a list; breakout_model.hits_per_life
- int
:桨在每个生命中击球的次数 ,以列表的形式; eg first element is the value for the first life, 2nd element is the value for the 2nd life, and so on up to 4 breakout_model.avg_paddle_offset_from_ball
- decimal
: the average linear distance in the X direction between the ball and the center of the paddle breakout_model.avg_paddle_offset_from_ball
decimal
:球与球拍中心之间X方向的平均线性距离 breakout_model.avg_paddle_offset_from_center
- decimal
: the average linear distance in the X direction between the center of the frame and the center of the paddle breakout_model.avg_paddle_offset_from_center
decimal
:框架中心和桨状中心之间在X方向上的平均线性距离 breakout_model.time
- int
: the total duration of the game, measured in frames breakout_model.time
- int
:游戏的总时长,以帧为单位 breakout_model.stale
- boolean
: whether or not the game was artificially terminated due to staleness (eg ball gets stuck bouncing directly vertical and paddle not moving) breakout_model.stale
boolean
:游戏是否由于陈旧性而被人为终止(例如,球被卡住而直接垂直弹起并且桨不移动) If you think I need more data about the final state of the game than just these, I can likely implement a way to get it very easily. 如果您认为我需要有关游戏最终状态的数据,而不仅仅是这些,那么我可能可以实现一种轻松获取它的方法。
Here is my current fitness calculation, which I don't think is very good: 这是我当前的健康度计算,我认为这不是很好:
def calculate_fitness(self):
self.fitness = self.breakout_model.score
if self.breakout_model.num_times_hit_paddle != 0:
self.fitness += self.breakout_model.num_times_hit_paddle / 10
else:
self.fitness -= 0.5
if self.breakout_model.avg_paddle_offset_from_ball != 0:
self.fitness -= (1 / self.breakout_model.avg_paddle_offset_from_ball) * 100
for hits in self.breakout_model.hits_per_life:
if hits == 0:
self.fitness -= 0.2
if self.breakout_model.stale:
self.fitness = 0 - self.fitness
return self.fitness
Here is what I think the fitness calculation should do, semantically: 我认为适应度计算应该从语义上做以下事情:
I'm not sure how to operate on all these values to make them affect the overall fitness appropriately. 我不确定如何使用所有这些值来使它们适当地影响整体适应性。
Thanks in advance for any help you can provide. 在此先感谢您提供的任何帮助。
I would minimize the conditional logic in your fitness function, using it only in those cases where you want to force the fitness score to 0 or a major penalty. 我将使适应度函数中的条件逻辑最小化,仅在您希望将适应度得分强制为0或较大惩罚的情况下使用它。 I would just decide how much weight each component of the score should have and multiply.
我只想决定分数的每个部分应具有的权重并相乘。 Negative components just add complexity to understanding the fitness function, with no real benefit;
负数组件只会增加理解适应度函数的复杂性,而没有真正的好处。 the model learns from the relative difference in scores.
该模型从分数的相对差异中学习。 So my version of the function would look something like this:
所以我的函数版本看起来像这样:
def fitness(...):
if total_hits == 0:
return 0
return (game_score/max_score) * .7 \
+ game_score/total_hits * .2 \
+ game_score_per_life/hits_per_life * .1
(Aside: I didn't include "distance from center of frame" because I think that's cheating; if staying near the center is a good thing to do to maximize play efficiency, then the agent should learn that on it's own. If you sneak all the intelligence into the fitness function, then your agent isn't intelligent at all.) (此外:我没有包括“距框架中心的距离”,因为我认为这是一种作弊;如果要保持中心位置最大化是最大化游戏效率的一件好事,那么经纪人应该自己学习这一点。所有智能都纳入适应功能,那么您的代理根本就不是智能的。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.