简体   繁体   中英

Service Fabric Upgrade stuck on PreUpgradeSafetyCheck

I have received a Warning that a new version of Service Fabric is available, however when I tried to upgrade it, the process was stuck at PreUpgradeSafetyCheck on node Rep_247. I've tried -Force and -ForceRestart but it hasn't helped.

Cluster Map

This issue is likely to be happening because service fabric can't take down a service in a safe manner to upgrade the node or application.

Whenever a node is upgraded, the services activated in the node must move to another node first, so that the node can be restarted without affecting your applications\\services availability.

In this case, doing so may cause a quorum loss when the service can't be placed in another node, maybe because there is no other node available, or because of placement constraints in the service, or there is only one instance of the service.

Because SF can't guarantee the reliability of the service, it will halt the upgrade process until a solution can be applied to fix the problem and the process continue.

From your cluster map and the message is possible to know the issue, your cluster has only one node of type ' Rep_247 ReportServerType ', I am assuming you have services with placement constraints to be deployed only on this node type, taking down the node will make these services unavailable, because the placement constraints will prevent them to move to another node type.

If the service are not constrained to that node type, the problem might be:

  • It is failing to activate on other nodes, example, dependencies are missing in the node, and this will fail to have the minimum replica.
  • The service has only one instance available and taking down will make the service unavailable.

PS: the same applies to the node MR_236 MRType

PreUpgradeSafetyCheck

An UpgradePhase of PreUpgradeSafetyCheck means there were issues preparing the upgrade domain before it was performed. The most common issues in this case are service errors in the close or demotion from primary code paths.

Possible solution for this case are:

  • Add more replicas\\instances of the service so the minimum quorum is meet.
  • Remove the Placement constraints of the service to let them move to other nodes.
  • Add an extra node of same node type so that the service can move out safely.
  • Taking down the service and recreate when the node is updated (last option if not stateful, otherwise will lose data)

You might be interested to see related issues:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM