I have been struggling to figure out the best way to approach this problem for a bash script. I have a command that will check groups of servers for their uptime in minutes. I only want to continue on to the next group of reboots once all of the servers have been up for 5 minutes but also want to verify they haven't been up for over an hour in-case the reboot doesn't take.
I was originally trying to setup a while loop that would keep issuing the command to check uptimes and send the output into an array. I am trying to figure out how you can loop through an array until all elements of that array are greater than 5 and less than. I haven't even been successful in the first check of greater than 5. Is it even possible to continually write to an array and perform arithmetic checks against every value in the array so that all values must be greater than X in a while loop? The number of servers that will be putting their current uptime into the array is varied per group so it won't always be the same number of values in the array.
Is an array even the proper way to do this? I'd provide examples of what I have tried so far but it's a huge mess and I think starting from scratch just asking for input might be best to start with.
Output of the command I am running to pull uptimes looks similar to the following:
1
2
1
4
3
2
Edit
Due to the help provided I was able to get a functional proof of concept together for this and I'm stoked. Here it is in case it might help anyone trying to do something similar in the future. The problem at hand was that we utilize AWS SSM for all of our Windows server patching and many times when SSM tells servers to reboot after patching the SSM Agent takes ages to check in. This slows our entire process down which right now is fairly manual across dozens of patch groups. Many times we have to go and manually verify a server did indeed reboot after we told it to from SSM so that we know we can start the reboots for the next patch group. With this we will be able to issue a single script that issues reboots for our patch groups in the proper order and verifies that the servers have properly rebooted before continuing on to the next group.
#!/bin/bash
### The purpose of this script is to automate the execution of commands required to reboot groups of AWS Windows servers utilizing SSM while also verifying their uptime and only continuing on to the next group once the previous has reached X # of minutes. This solves the problems of AWS SSM Agents not properly checking in with SSM post-reboot.
patchGroups=(01 02 03) # array containing the values of the RebootGroup tag
for group in "${patchGroups[@]}"
do
printf "Rebooting Patch Group %q\n" "$group"
aws ec2 reboot-instances --instance-ids `aws ec2 describe-instances --filters "Name=tag:RebootGroup,Values=$group" --query 'Reservations[].Instances[].InstanceId' --output text`
sleep 2m
unset passed failed serverList # wipe arrays
declare -A passed failed serverList # declare associative arrays
serverList=$(aws ec2 describe-instances --filter "Name=tag:RebootGroup,Values=$group" --query 'Reservations[*].Instances[*].[InstanceId]' --output text)
for server in ${serverList} # loop through list of servers
do
failed["${server}"]=0 # add to the failed[] array
done
while [[ "${#failed[@]}" -gt 0 ]] # loop while number of servers in the failed[] array is greater than 0
do
for server in "${!failed[@]}" # loop through servers in the failed[] array
do
ssmID=$(aws ssm send-command --document-name "AWS-RunPowerShellScript" --document-version "1" --targets "[{\"Key\":\"InstanceIds\",\"Values\":[\"$server\"]}]" --parameters '{"commands":["$wmi = Get-WmiObject -Class Win32_OperatingSystem ","$uptimeMinutes = ($wmi.ConvertToDateTime($wmi.LocalDateTime)-$wmi.ConvertToDateTime($wmi.LastBootUpTime) | select-object -expandproperty \"TotalMinutes\")","[int]$uptimeMinutes"],"workingDirectory":[""],"executionTimeout":["3600"]}' --timeout-seconds 600 --max-concurrency "50" --max-errors "0" --region us-west-2 --output text --query "Command.CommandId")
sleep 5
uptime=$(aws ssm list-command-invocations --command-id "$ssmID" --details --query 'CommandInvocations[].CommandPlugins[].Output' --output text | sed 's/\r$//')
printf "Checking instance ID %q\n" "$server"
printf "Value of uptime is = %q\n" "$uptime"
# if uptime is within our 'success' window then move server to passed[] array
if [[ "${uptime}" -ge 3 && "${uptime}" -lt 60 ]]
then
passed["${server}"]="${uptime}" # add to passed[] array
printf "Server with instance ID %q has successfully rebooted.\n" "$server"
unset failed["${server}"] # remove from failed[] array
fi
done
# display current status (edit/remove as desired)
printf "\n++++++++++++++ successful reboots\n"
printf "%s\n" "${!passed[@]}" | sort -n
printf "\n++++++++++++++ failed reboot\n"
for server in ${!failed[@]}
do
printf "%s - %s (mins)\n" "${server}" "${failed[${server}]}"
done | sort -n
printf "\n"
sleep 60 # adjust as necessary
done
done
It sounds like you need to keep re-evaluating the output of uptime to get the data you need, so an array or other variable may just get you stuck. Think about this functionally (as in functions ). You need a function that checks if the uptime is within the bounds you want, just once . Then, you need to run that function periodically. If it is successful, you trigger the reboot. If it fails, you let it try again later.
Consider this code:
uptime_in_bounds() {
local min="$1"
local max="$2"
local uptime_secs
# The first value in /proc/uptime is the number of seconds the
# system has been up. We have to truncate it to an integer…
read -r uptime_float _ < /proc/uptime
uptime_secs="${uptime_float%.*}"
# A shell function reflects the exit status of its last command.
# This function "succeeds" if the uptime_secs is between min and max.
(( min < uptime_secs && max > uptime_secs ))
}
if uptime_in_bounds 300 3600; then
sudo reboot # or whatever
fi
General idea... will likely need some tweaking based on how OP is tracking servers, obtaining uptimes, etc...
# for a given set of servers, and assuming stored in variable ${server_list} ...
unset passed failed # wipe arrays
declare -A passed failed # declare associative arrays
for server in ${server_list} # loop through list of servers
do
failed["${server}"]=0 # add to the failed[] array
done
while [[ "${#failed[@]}" -gt 0 ]] # loop while number of servers in the failed[] array is greater than 0
do
for server in "${!failed[@]}" # loop through servers in the failed[] array
do
uptime=$( some_command_to_get_uptime_for_server "${server}" )
# if uptime is within our 'success' window then move server to passed[] array
if [[ "${uptime}" -gt 5 && "${uptime}" -lt 60 ]]
then
passed["${server}"]="${uptime}" # add to passed[] array
unset failed["${server}"] # remove from failed[] array
else
failed["${server}"]="${uptime}"
fi
done
# display current status (edit/remove as desired)
printf "\n++++++++++++++ successful reboots\n"
printf "%s\n" "${!passed[@]}" | sort -n
printf "\n++++++++++++++ failed reboot\n"
for server in ${!failed[@]}
do
printf "%s - %s (mins)\n" "${server}" "${failed[${server}]}"
done | sort -n
printf "\n"
sleep 30 # adjust as necessary
done
NOTES :
${server_list}
for
loop to properly populate the failed[]
array${server}
while
loop continues 'too long'${uptime}
is not within the 5-60 min range, OP can add an else
block to perform some other operation(s) for the problematic ${server}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.