简体   繁体   中英

How to make 2 applications run each other in linux?

The situation is as follows:

We have a main application and a watcher application. Both of them are c++ applications. Both of them use daemon(1,0) function.

Watcher checks if main application is running and if it finds that main process is absent (crashed) or that main does not respond (applications 'talk' to each other through TCP and thats how it knows if it hung) then it runs the main or restarts it.

Now, TCP settings for the connection can be changed by the user, and it is done through main app. After the change, watcher must be restarted to load the new configuration. That is done from the main app.

As it is, it works fine.
1. On startup Main app DOES kill existing watcher process and runs it again. [This is correct]
2. Watcher app DOES kill main and runs it again. [This is correct]

BUT

  1. If i run Main, which in turn starts Watcher,
  2. then kill the Main so the Watcher is left alone.
  3. Watcher sees that there is no Main anymore and so it starts it again.
  4. Main starts again, kills the watcher and tries to start it again....
  5. and at this point, some kind of nonesence happens. It starts the watcher (i can see that TCP port being taken through netstat command), but there is no process named watcher.

If normally netstat shows tcp 0 0 IP:TCP_PORT LISTEN Watcher , now it shows tcp 0 0 IP:TCP_PORT LISTEN Main .

It is as if watcher is there, but inside the Main process.

I use scripts to run applications. Watcher uses this

#!/bin/sh
killall -9 Main
./Main

And runs it like system("./runMain.sh&");

Main uses this

#!/bin/sh
killall -9 Watcher
./Watcher

And runs it like system("./runWatcher.sh&");

What am i doing wrong? How do i run them so they could restart each other when needed and always start in separate processes?

So far i have also tried running the scripts using the nohup , result is the same.

EDIT 1:

Note: numbers here are just for clarity. In reality PID is not 1 of course.

  1. I run Main. netstat shows me:

    tcp 0 0 192.168.0.1:7000 LISTEN (PID 1)Main
    tcp 0 0 192.168.0.1:7001 LISTEN (PID 1)Main

  2. Main starts the Watcher using the script. Now netstat shows me:

    tcp 0 0 192.168.0.1:7000 LISTEN (PID 1)Main
    tcp 0 0 192.168.0.1:7001 LISTEN (PID 1)Main
    tcp 0 0 192.168.0.1:8000 LISTEN (PID 2)Watcher

  3. Now, i manually kill Main by doing killall -9 Main . Now netstat shows me:

    tcp 0 0 192.168.0.1:7000 LISTEN (PID 2)Watcher
    tcp 0 0 192.168.0.1:7001 LISTEN (PID 2)Watcher
    tcp 0 0 192.168.0.1:8000 LISTEN (PID 2)Watcher

    Notice the change in who owns the listening sockets now? How did that happen?

  4. Watcher sees that Main is gone and so it starts it using the script file.

  5. Main kills the Watcher on startup. Netstat shows:

    tcp 0 0 192.168.0.1:7000 LISTEN (PID 3)Main
    tcp 0 0 192.168.0.1:7001 LISTEN (PID 3)Main
    tcp 0 0 192.168.0.1:8000 LISTEN (PID 3)Main

And thats it. Watcher never runs again. I tried to debug in Eclipse, Watcher crashes without throwing anything right on the line daemon(1,0) .

How about using a custom signal (or even listening on another port for admin commands)? Using the kill -9 is playing with the process tree such as the child process gaining control of the parent's resources (ports, etc.)

Then, on top of that, when the Main process is started by the Watcher, why does it assume that the running instance of Watcher should be killed? One reason is now Watcher is the parent of Main, so I can see how that could cause trouble.

It comes down to the need for the two processes to communicate outside of the 'kill' signal.

Use a semaphore or some other OS-level communication mechanism to coordinate between the two.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM