简体   繁体   English

调试多线程服务器

[英]Debugging a Multi-Threaded server

I was asked this on an interview and now I'm curious because I don't think interviewer was satisfied with my answer. 我在接受采访时被问到这个问题,现在我很好奇,因为我不认为面试官对我的回答感到满意。 Here's the question: 这是问题:

A Multi-threaded server application stops working and the last log message from the application is: 多线程服务器应用程序停止工作,应用程序的最后一条日志消息是:

"Some Server Related Message..."

Code looks like: 代码如下:

CalledFunc ()
{
    Code ...

    Acquiring Thread lock
    Line printing "Some Server Related Message..."
    Func();
    Releasing Thread Lock
}
  1. What should the programmer in charge do to debug this? 程序员应该怎么做才能调试这个?
  2. What has happened wrong in the Func() ? Func()发生了什么错误?
  3. If an exception is thrown in the Func() what should be done to fix problem ? 如果在Func()抛出异常,应该怎么做才能解决问题?

Reason #1: it's a database problem. 原因#1:这是一个数据库问题。 This may sound strange, but the main reason an application server hangs is not directly related to the application server itself. 这可能听起来很奇怪,但应用程序服务器挂起的主要原因与应用程序服务器本身没有直接关系。 The location of the symptom is rarely the location of the root cause. 症状的位置很少是根本原因的位置。 The following scenario is quite common: 以下场景很常见:

The database is bottlenecked, causing queries to run slower than usual. 数据库存在瓶颈,导致查询运行速度比平常慢。 Requests that used to take 1 second, now take 5 seconds to complete. 过去需要1秒的请求,现在需要5秒才能完成。 The average number of concurrent requests slowly increases (due to backlog). 平均并发请求数缓慢增加(由于积压)。 The server runs out of threads and the application server hangs. 服务器用尽线程,应用程序服务器挂起。 If you manage to get a thread dump, you'll just see a bunch of threads waiting and another group that's actually running. 如果你设法获得一个线程转储,你只会看到一堆线程正在等待,而另一个组实际上正在运行。 Another possibility is that the number of waiting threads (or queued threads) will gobble up all available memory and, eventually, lead to an OutOfMemory error. 另一种可能性是等待线程(或排队线程)的数量将吞噬所有可用内存,并最终导致OutOfMemory错误。

Reason #2: deadlocks. 原因#2:死锁。 If it seems that the application server is doing nothing, look for deadlocks. 如果应用程序服务器似乎无效,请查找死锁。 These can be database deadlocks that cause your SQL queries to hang, or seek the update statements. 这些可能是导致SQL查询挂起或寻找更新语句的数据库死锁。 For example, a transaction log that is written to the database for each request may easily hang the entire application if the log table is locked. 例如,如果日志表被锁定,则为每个请求写入数据库的事务日志可能会轻易挂起整个应用程序。 Also check for shared objects—an operating system file that is written to from multiple threads at once. 还要检查共享对象 - 一次从多个线程写入的操作系统文件。

Reason #3: run-away thread. 原因#3:失控线程。 In cases where the application server is indeed to blame, you should look for a run-away thread. 在应用程序服务器确实应该受到责备的情况下,您应该寻找一个失控的线程。 These are hard to detect because they hardly show up on logs since they are usually only written when the request has completed. 这些很难被发现,因为它们几乎不会出现在日志中,因为它们通常只在请求完成时才会被写入。 A run-away thread will probably not return until it has already affected the entire application. 一个失控的线程可能不会返回,直到它已经影响整个应用程序。 Therefore, the hanging request will not be written to the log. 因此,挂起请求不会写入日志。 These 'runaway' threads typically include infinite loops or code that results in consuming too much heap memory resulting in out of memory. 这些“失控”线程通常包含无限循环或代码,导致消耗过多的堆内存导致内存不足。 For example, a query that should show results that does not include the option of paging between result pages suddenly needs to display a large number of results. 例如,应该显示结果不包括结果页面之间的分页选项的查询突然需要显示大量结果。 The page takes forever to render and clobbers the application server, eventually causing it to hang. 该页面永远需要渲染并破坏应用程序服务器,最终导致它挂起。

It is likely that either: 它可能是:

  • Func() is trying to get the lock once again (easy to check), or Func()试图再次获得锁定(易于检查),或者
  • Func() has thrown an exception with the lock locked (more likely and subtle) Func()在锁被锁定时抛出异常(更可能和更微妙)

So: 所以:

  1. Check the code of Func() to check if all possible paths (exceptions included) release the lock 检查Func()的代码以检查是否所有可能的路径(包括例外)都释放锁
  2. One of the two options above 上面两个选项之一
  3. Release the lock before throwing the exception or catch the exception in CalledFunc() and release the lock 在抛出异常之前释放锁定或在CalledFunc()中捕获异常并释放锁定

To overcome the problem with the exception in Func() you can use a scoped lock. 要解决Func()中的异常问题,您可以使用范围锁。 RAII is a good way to ensure exception safety and avoid leaks in general. RAII是确保异常安全并避免一般泄漏的好方法。 That link also happens to have a mutex as an example. 该链接也恰好以互斥锁为例。

Also, seeing that line in the log doesn't mean that the problem comes from this part of the code. 此外,在日志中看到该行并不意味着问题来自代码的这一部分。

I think they where looking for this: 我认为他们在寻找这个:

What should the programmer in charge do to debug this? 程序员应该怎么做才能调试这个?

Get a hang dump of the process and then use windbg to figure out the cause, ie if its a dead lock then it will be obvious from the dump. 获取进程的挂起转储,然后使用windbg找出原因,即如果它是一个死锁,那么它将从转储中显而易见。

What has happened wrong in the Func() ? 在Func()中发生了什么错误?

From what the next question asks we can assume it must have thrown an exception as some point, causing the lock to never be released, or it attempted to get the lock again causing a dead lock. 从下一个问题我们可以假设它必须抛出一个异常,导致锁永远不会被释放,或者它试图再次获得锁导致死锁。

If an exception is thrown in the Func() what should be done to fix problem ? 如果在Func()中抛出异常,应该怎么做才能解决问题?

Use RAII to be exception safe and for better/cleaner code. 使用RAII是异常安全的,以及更好/更清晰的代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM