简体   繁体   中英

Monitoring threads

I have an app that spawns multiple threads. Each thread has quite complex logic. Sometimes there is a deadlock or other problem and I would like to be informed that Thread is not in a "healthy" state anymore.

Options that come to my mind:

  1. Log and monitor logs
  2. Update shared object in memory from each thread (for example dictionary with thread id as a key and status structure: OK, timespan and then in separate thread observe that dictionary and if thread hasn't updated it's status for X minutes then it means that something went wrong)

But I have a feeling that it isn't the best solution. Is there any pother good practice for monitoring threads? Maybe thread itself can update it's global state with timestamp and process has access to that information?

There are a couple of ideas here that come to mind that I'll outline.

1 Make Your Code More Robust

obviously, the easiest way to keep your thread healthy is to account for all these unhealthy states, if you're running into deadlocks use locks & semaphores more appropriately, debug why your deadlocks happen and find a way to account for these bad scenarios.

2 Agents & Error Handlers

Kind of related to above, if you can get to the point where you can correctly identify these issues without being able to fix them, you can put in place a system where the thread can know it's in a dangerous state and implement a system where that thread can say message the master thread, say that it's in a bad state, the master thread can then shut down the affected thread, spin up a new one from a set "safe state" and try to continue on from there. Multi-Process based languages like Elixir are very fond of this style of protection.

3 Logging/Polling

If there really is no way for you to tell what/how things are going on (there always is it's just sometimes too difficult) then the thread updating the shared resource on a set timeframe is a fairly simple thing to implement. Every minute have your thread update a float by exactly how long it has been since it last updated it, have your main thread check this float say every few minutes, (leave lots of space if your thread is too busy to update it on time). If say 5 minutes have gone by without an update then you can be fairly sure your thread is deadlocked. If you want to be able to see this after the fact then you can replicate these update messages to a log file too with the stack trace that time to see where it's getting stuck

Conclusion

To conclude, if you can fix/reliably tell when an error/infected state is occurring, you can write code to account for those bad states, if you can't do that then your next goal should be getting to a point where you can do that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM