简体   繁体   English

需要帮助设计基于NEAT算法的神经网络的适应性评估

[英]Need help designing fitness evaluation for a NEAT algorithm-based neural network

I am working on a neural network based on the NEAT algorithm that learns to play an Atari Breakout clone in Python 2.7, and I have all of the pieces working, but I think the evolution could be greatly improved with a better algorithm for calculating species fitness. 我正在研究一个基于NEAT算法的神经网络,该算法学习如何在Python 2.7中播放Atari Breakout克隆,并且所有部分都起作用,但是我认为,使用更好的算法来计算物种适应度可以大大改善进化。

The inputs to the neural network are: 神经网络的输入是:

  • X coordinate of the center of the paddle 球拍中心的X坐标
  • X coordinate of the center of the ball 球中心的X坐标
  • Y coordinate of the center of the ball 球中心的Y坐标
  • ball's dx (velocity in X) 球的dx(X速度)
  • ball's dy (velocity in Y) 球的dy(Y速度)

The outputs are: 输出为:

  • Move paddle left 左移桨
  • Move paddle right 向右移动拨片
  • Do not move paddle 请勿移动桨

The parameters I have available to the species fitness calculation are: 我可用于物种适应度计算的参数是:

  • breakout_model.score - int : the final score of the game played by the species breakout_model.score - int :该物种玩游戏的最终分数
  • breakout_model.num_times_hit_paddle - int : the number of times the paddle hit the ball breakout_model.num_times_hit_paddle - int :球拍击球的次数
  • breakout_model.hits_per_life - int : the number of times the paddle hit the ball per life , in the form of a list; breakout_model.hits_per_life - int :桨在每个生命中击球的次数 ,以列表的形式; eg first element is the value for the first life, 2nd element is the value for the 2nd life, and so on up to 4 例如,第一个元素是第一个寿命的值,第二个元素是第二个寿命的值,依此类推,直到4
  • breakout_model.avg_paddle_offset_from_ball - decimal : the average linear distance in the X direction between the ball and the center of the paddle breakout_model.avg_paddle_offset_from_ball decimal :球与球拍中心之间X方向的平均线性距离
  • breakout_model.avg_paddle_offset_from_center - decimal : the average linear distance in the X direction between the center of the frame and the center of the paddle breakout_model.avg_paddle_offset_from_center decimal :框架中心和桨状中心之间在X方向上的平均线性距离
  • breakout_model.time - int : the total duration of the game, measured in frames breakout_model.time - int :游戏的总时长,以帧为单位
  • breakout_model.stale - boolean : whether or not the game was artificially terminated due to staleness (eg ball gets stuck bouncing directly vertical and paddle not moving) breakout_model.stale boolean :游戏是否由于陈旧性而被人为终止(例如,球被卡住而直接垂直弹起并且桨不移动)

If you think I need more data about the final state of the game than just these, I can likely implement a way to get it very easily. 如果您认为我需要有关游戏最终状态的数据,而不仅仅是这些,那么我可能可以实现一种轻松获取它的方法。

Here is my current fitness calculation, which I don't think is very good: 这是我当前的健康度计算,我认为这不是很好:

def calculate_fitness(self):
    self.fitness = self.breakout_model.score
    if self.breakout_model.num_times_hit_paddle != 0:
        self.fitness += self.breakout_model.num_times_hit_paddle / 10
    else:
        self.fitness -= 0.5
    if self.breakout_model.avg_paddle_offset_from_ball != 0:
        self.fitness -= (1 / self.breakout_model.avg_paddle_offset_from_ball) * 100
    for hits in self.breakout_model.hits_per_life:
        if hits == 0:
            self.fitness -= 0.2
    if self.breakout_model.stale:
        self.fitness = 0 - self.fitness
    return self.fitness

Here is what I think the fitness calculation should do, semantically: 认为适应度计算应该从语义上做以下事情:

  • The score, obviously, should have the most significant impact on the overall fitness. 显然,该分数对整体健康状况的影响最大。 Maybe a score of 0 should slightly negatively affect the fitness? 也许得分为0会对健身产生负面影响?
  • The number of times that the paddle hit the ball per life should have some effect, but not as significant of a contribution/weight. 桨一生击球的次数应该会有所影响,但对贡献/重量的影响不大。 eg if that number is 0, it didn't even really try to hit the ball at all during that life, so it should have a negative effect 例如,如果该数字为0,则在该生中甚至根本没有尝试击球,因此它应该产生负面影响
  • The number of times that the paddle hit the ball total should also have some effect, and its contribution should be based on the score. 球拍击球总数的次数也应有所影响,其贡献应基于得分。 eg if it didn't hit the ball many times and also didn't score many points, that should have a significant negative effect; 例如,如果它没有多次击球,也没有得分很多,那将会产生重大的负面影响; if it didn't hit the ball many times but scored a high number of points, that should have a significant positive effect. 如果它没有多次击球,但是得分很高,那将产生显着的积极影响。 Overall, (I think ) the closer to equal this value is to the game score, the less contribution/weight this value should have on fitness 总体而言,(我认为 )该值越接近于游戏得分,该值对健身的贡献/权重就越小
  • The average distance in the X direction between the center of the frame and the center of the paddle should basically encourage a central "resting" position for paddle 框架的中心和桨的中心之间在X方向上的平均距离应基本上鼓励桨的中心“静止”位置
  • If the game was ended artificially due to staleness, either this should have a significant negative effect, or it should automatically force the fitness to be 0.0; 如果游戏由于陈旧性而被人为地终止,则这可能会产生重大的负面影响,或者会自动将适应度设置为0.0; I'm not sure which case would be better 我不确定哪种情况会更好

I'm not sure how to operate on all these values to make them affect the overall fitness appropriately. 我不确定如何使用所有这些值来使它们适当地影响整体适应性。

Thanks in advance for any help you can provide. 在此先感谢您提供的任何帮助。

I would minimize the conditional logic in your fitness function, using it only in those cases where you want to force the fitness score to 0 or a major penalty. 我将使适应度函数中的条件逻辑最小化,仅在您希望将适应度得分强制为0或较大惩罚的情况下使用它。 I would just decide how much weight each component of the score should have and multiply. 我只想决定分数的每个部分应具有的权重并相乘。 Negative components just add complexity to understanding the fitness function, with no real benefit; 负数组件只会增加理解适应度函数的复杂性,而没有真正的好处。 the model learns from the relative difference in scores. 该模型从分数的相对差异中学习。 So my version of the function would look something like this: 所以我的函数版本看起来像这样:

def fitness(...):
    if total_hits == 0:
        return 0
    return (game_score/max_score) * .7 \
           + game_score/total_hits * .2 \
           + game_score_per_life/hits_per_life * .1

(Aside: I didn't include "distance from center of frame" because I think that's cheating; if staying near the center is a good thing to do to maximize play efficiency, then the agent should learn that on it's own. If you sneak all the intelligence into the fitness function, then your agent isn't intelligent at all.) (此外:我没有包括“距框架中心的距离”,因为我认为这是一种作弊;如果要保持中心位置最大化是最大化游戏效率的一件好事,那么经纪人应该自己学习这一点。所有智能都纳入适应功能,那么您的代理根本就不是智能的。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM