简体   繁体   English

Service Fabric升级停留在PreUpgradeSafetyCheck上

[英]Service Fabric Upgrade stuck on PreUpgradeSafetyCheck

I have received a Warning that a new version of Service Fabric is available, however when I tried to upgrade it, the process was stuck at PreUpgradeSafetyCheck on node Rep_247. 我收到一个警告,提示有可用的Service Fabric新版本,但是,当我尝试对其进行升级时,该过程被卡在节点Rep_247上的PreUpgradeSafetyCheck中。 I've tried -Force and -ForceRestart but it hasn't helped. 我已经尝试了-Force和-ForceRestart,但是并没有帮助。

Cluster Map 集群图

This issue is likely to be happening because service fabric can't take down a service in a safe manner to upgrade the node or application. 因为服务结构无法以安全的方式关闭服务来升级节点或应用程序,所以可能会发生此问题。

Whenever a node is upgraded, the services activated in the node must move to another node first, so that the node can be restarted without affecting your applications\\services availability. 每当升级节点时,必须先将在该节点中激活的服务移至另一个节点,以便可以重新启动该节点而不会影响应用程序/服务的可用性。

In this case, doing so may cause a quorum loss when the service can't be placed in another node, maybe because there is no other node available, or because of placement constraints in the service, or there is only one instance of the service. 在这种情况下,当无法将服务放置在另一个节点中时,这样做可能会导致仲裁丢失,可能是因为没有其他可用的节点,或者是由于服务中的放置限制,或者只有一个服务实例。

Because SF can't guarantee the reliability of the service, it will halt the upgrade process until a solution can be applied to fix the problem and the process continue. 由于SF无法保证服务的可靠性,因此它将暂停升级过程,直到可以应用解决方案来解决问题并继续进行。

From your cluster map and the message is possible to know the issue, your cluster has only one node of type ' Rep_247 ReportServerType ', I am assuming you have services with placement constraints to be deployed only on this node type, taking down the node will make these services unavailable, because the placement constraints will prevent them to move to another node type. 从您的集群映射中可以看到该消息,您的集群只有一个节点类型为“ Rep_247 ReportServerType ”,我假设您具有带有放置约束的服务只能在该节点类型上部署,因此将节点取下使这些服务不可用,因为放置约束将阻止它们移动到另一节点类型。

If the service are not constrained to that node type, the problem might be: 如果服务不限于该节点类型,则问题可能是:

  • It is failing to activate on other nodes, example, dependencies are missing in the node, and this will fail to have the minimum replica. 它无法在其他节点上激活,例如,该节点中缺少依赖项,这将导致副本数量最少。
  • The service has only one instance available and taking down will make the service unavailable. 该服务只有一个实例可用,取消运行将使该服务不可用。

PS: the same applies to the node MR_236 MRType PS:同样适用于节点MR_236 MRType

PreUpgradeSafetyCheck PreUpgradeSafetyCheck

An UpgradePhase of PreUpgradeSafetyCheck means there were issues preparing the upgrade domain before it was performed. PreUpgradeSafetyCheck的UpgradePhase意味着在执行升级域之前准备问题。 The most common issues in this case are service errors in the close or demotion from primary code paths. 在这种情况下,最常见的问题是服务关闭或主代码路径降级时的错误。

Possible solution for this case are: 这种情况的可能解决方案是:

  • Add more replicas\\instances of the service so the minimum quorum is meet. 添加服务的更多副本\\实例,以便达到最低仲裁人数。
  • Remove the Placement constraints of the service to let them move to other nodes. 删除服务的Placement约束,以使其移动到其他节点。
  • Add an extra node of same node type so that the service can move out safely. 添加相同节点类型的其他节点,以便服务可以安全地移出。
  • Taking down the service and recreate when the node is updated (last option if not stateful, otherwise will lose data) 断开服务并在节点更新时重新创建(如果没有状态,则为最后一个选项,否则将丢失数据)

You might be interested to see related issues: 您可能有兴趣查看相关问题:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM