简体   繁体   中英

Check running status of a background process launched from same bash script

I have to write a bash script that launches a process in background in accordance to command line argument passed and returns if it were successfully able to run launch the program.

Here is a pseudo code of what I am trying to achieve

if [ "$1" = "PROG_1" ] ; then
    ./launchProg1 &
    if [ isLaunchSuccess ] ; then
        echo "Success"
    else
        echo "failed"
        exit 1
    fi
elif [ "$1" = "PROG_2" ] ; then
    ./launchProg2 &
    if [ isLaunchSuccess ] ; then
        echo "Success"
    else
        echo "failed"
        exit 1
    fi
fi

Script cannot wait or sleep since it will be called by another mission critical c++ program and needs high throughput ( wrt no of processes started per second ) and moreover running time of processes are unknown. Script neither needs to capture any input/output nor waits for launched process' completion.

I have unsuccessfully tried the following:

#Method 1
if [ "$1" = "KP1" ] ; then
    echo "The Arguement is KP1"
    ./kp 'this is text' &
    if [ $? = "0" ] ; then
        echo "Success"
    else
        echo "failed"
        exit 1
    fi
elif [ "$1" = "KP2" ] ; then
    echo "The Arguement is KP2"
    ./NoSuchCommand 'this is text' &
    if [ $? = "0" ] ; then
        echo "Success"
    else
        echo "failed"
        exit 1
    fi
#Method 2
elif [ "$1" = "CD5" ] ; then
    echo "The Arguement is CD5"
    cd "doesNotExist" &
    PROC_ID=$!
    echo "PID is $PROC_ID"
    if kill -0 "$PROC_ID" ; then
        echo "Success"
    else
        echo "failed"
        exit 1
    fi
#Method 3
elif [ "$1" = "CD6" ] ; then
    echo "The Arguement is CD6"
    cd .. &
    PROC_ID=$!
    echo "PID is $PROC_ID"
    ps -eo pid | grep "$PROC_ID" && { echo "Success"; exit 0; }
    ps -eo pid | grep  "$PROC_ID" || { echo "failed" ; exit 1; }
else
    echo "Unknown Argument"
    exit 1
fi

Running the script gives unreliable output. Method 1, 2 always return Success while Method 3 returns failed when process execution finishes before the checks.

Here is sample tested on GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu) and GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu)

[scripts]$ ./processStarted3.sh KP1
The Arguement is KP1
Success
[scripts]$ ./processStarted3.sh KP2
The Arguement is KP2
Success
./processStarted3.sh: line 13: ./NoSuchCommand: No such file or directory
[scripts]$ ./processStarted3.sh CD6
The Arguement is CD6
PID is 25050
failed

As suggested in similar questions, I cannot use process names as one process may be executed several times and others can't be applied.

I have not tried screen and tmux , since getting permission to install them on production servers wont be easy ( but will do so if that is the only option left )

UPDATE
@ghoti
./kp is program which exists and launching the program returns Success . ./NoSuchCommand does not exist. Still as you can see from (edited) output, script incorrectly returns Success .

It does not matter when the process completes execution or program abnormally terminates . Programs launched via script are not tracked in any way ( hence we do not store pid in any table nor necessity arises to use deamontools ).

@Etan Reisner
Example of a program which fails to launch will be ./NoSuchCommand ,which does not exist. Or maybe a corrupted program which fails to start.

@Vorsprung
Calling a script which launches a program in background does not take alot of time ( and is manageable as per our expectations). But sleep 1 will accumulate over time to cause issues.

Aforementioned #Method3 works fine barring processes which terminate before ps -eo pid | grep "$PROC_ID" && { echo "Success"; exit 0; } ps -eo pid | grep "$PROC_ID" && { echo "Success"; exit 0; } ps -eo pid | grep "$PROC_ID" && { echo "Success"; exit 0; } check can be performed.

Here is an example which will show the result of a process whether it is started successfully or not.

#!/bin/bash
$1 & #executes a program in background which is provided as an argument
pid=$! #stores executed process id in pid
count=$(ps -A| grep $pid |wc -l) #check whether process is still running
if [[ $count -eq 0 ]] #if process is already terminated, then there can be two cases, the process executed and stop successfully or it is terminated abnormally
then
        if wait $pid; then #checks if process executed successfully or not
                echo "success"
        else                    #process terminated abnormally
                echo "failed (returned $?)"
        fi
else
        echo "success"  #process is still running
fi

#Note: The above script will only provide a result whether process started successfully or not. If porcess starts successfully and later it terminates abnormally then this sciptwill not provide a correct result

The accepted answer doesn't work as advertised.

The count in this check will always be at least 1 because "grep $pid" will find both the process with $pid if it exists and the grep.

count=$(ps -A| grep $pid |wc -l)
if [[ $count -eq 0 ]]
then
    ### We can never get here
else
    echo "success"  #process is still running
fi

Changing the above to check for a count of 1 or excluding the grep from the count should make the original work.

Here is an alternate (maybe simpler) implementation of the original example.

#!/bin/bash
$1 & # executes a program in background which is provided as an argument
pid=$! # stores executed process id in pid

# check whether process is still running
# The "[^[]" excludes the grep from finding itself in the ps output
if ps | grep "$pid[^[]" >/dev/null
then
    echo "success (running)"  # process is still running
else
    # If the process is already terminated, then there are 2 cases:
    # 1) the process executed and stop successfully
    # 2) it is terminated abnormally

    if wait $pid # check if process executed successfully or not
    then
        echo "success (ran)"
    else
        echo "failed (returned $?)" # process terminated abnormally
    fi
fi

# Note: The above script will detect if a process started successfully or not. If process is running when we check, but later it terminates abnormally then this script will not detect this.

use jobs .

for demonstration put the following in a bash script and execute

#!/bin/bash

echo === still running ===================
{ sleep 1 ; echo done ; } &
sleep 0.1
jobs
wait

echo === done with zero exit status ======
echo done &
sleep 0.1
jobs
wait

echo === done with nonzero exit status ===
false &
sleep 0.1
jobs
wait

echo === command not found ===============
notexisting &
sleep 0.1
jobs
wait

echo === not executable ==================
./existingbutnotexecutable &
sleep 0.1
jobs
wait

output

$ ./jobcontrol.sh 
=== still running ===================
[1]+  Running                 { sleep 1; echo done; } &
done
=== done with zero exit status ======
done
[1]+  Done                    echo done
=== done with nonzero exit status ===
[1]+  Exit 1                  false
=== command not found ===============
jobcontrol.sh: line 26: notexisting: command not found
[1]+  Exit 127                notexisting
=== not executable ==================
jobcontrol.sh: line 33: ./existingbutnotexecutable: Permission denied
[1]+  Exit 126                ./existingbutnotexecutable

(the file existingbutnotexecutable must exist and must not be executable)

from the output of jobs we can differ between:

  • a background job that is still running
  • a job that is done running
  • a job that is done running with nonzero exitstatus
  • a job that could not run because command not found
  • and a job that could not run because not executable.

maybe there are even more cases but i did not research more.

the wait is to make sure that there are no more than one background jobs at once. this is only for test and demonstration purposes. you can omit the wait for the production release.

the sleep 0.1 on the other hand is to prevent race condition. jobs seem to be really fast and will start and finish and report result even before the background job is properly started. without the sleep the jobs command seem to always say "running" and always is done before the result of the background commands. error or not.

maybe there are other ways to prevent the race without sleep . i did not research that deeply. in my tests sleep 0 will still fail (race condition) about 1 out of 10 times. maybe sleep 0.01 is reliable enough and fast enough.


here is an example for human friendly output based on the output of jobs

#!/bin/bash

isrunsuccess() {
  sleep 0.1
  case $(jobs) in
    *Running*)   echo "status: running" ;;
    *Done*)      echo "status: done" ;;
    *Exit\ 127*) echo "status: not found" ;;
    *Exit\ 126*) echo "status: not executable" ;;
    *Exit*)      echo "status: done nonzero exitstatus" ;;
  esac
}

echo === still running ===================
{ sleep 1 ; echo done ; } &
isrunsuccess
wait

echo === done with zero exit status ======
echo done &
isrunsuccess
wait

echo === done with nonzero exit status ===
false &
isrunsuccess
wait

echo === command not found ===============
notexisting &
isrunsuccess
wait

echo === not executable ==================
./existingbutnotexecutable &
isrunsuccess
wait

output

$ ./jobcontrol.sh 
=== still running ===================
status: running
done
=== done with zero exit status ======
done
status: done
=== done with nonzero exit status ===
status: done nonzero exitstatus
=== command not found ===============
./jobcontrol.sh: line 41: notexisting: command not found
status: not found
=== not executable ==================
./jobcontrol.sh: line 47: ./existingbutnotexecutable: Permission denied
status: not executable

you can merge the "did run" and "did not run" cases

isrunsuccess() {
  sleep 0.1
  case $(jobs) in
    *Exit\ 127*|*Exit\ 126*) echo "status: did not run" ;;
    *Running*|*Done*|*Exit*) echo "status: did run or still running" ;;
  esac
}

output

$ ./jobcontrol.sh 
=== still running ===================
status: did run or still running
done
=== done with zero exit status ======
done
status: did run or still running
=== done with nonzero exit status ===
status: did run or still running
=== command not found ===============
./jobcontrol.sh: line 50: notexisting: command not found
status: did not run
=== not executable ==================
./jobcontrol.sh: line 56: ./existingbutnotexecutable: Permission denied
status: did not run

other methods to check contents of string in bash: How do you tell if a string contains another string in POSIX sh?

documentation of bash stating that exitstatus 127 for not found and 126 for not executable: https://www.gnu.org/software/bash/manual/html_node/Exit-Status.html

sorry missed this requirement "Script cannot wait or sleep"

launch the background program, get it's pid. Wait a second. Then check it is still running with kill -0

kill -0 status is taken from $? and this is used to decide if the process is still running

#!/bin/bash

./$1 &
pid=$!

sleep 1;

kill -0 $pid
stat=$?
if [ $stat -eq 0 ] ; then
  echo "running as $!"
  exit 0
else
  echo "$! did not start"
  exit 1
fi

Maybe if your super speedy C++ program cannot wait for a second, it also cannot expect to be able to launch a load of shell commands at a high rate per second?

Maybe you need to implement a queue here?

Sorry for more questions than answers

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM