简体   繁体   English

如何处理第三方代码中的死锁

[英]How to handle a deadlock in third-party code

We have a third-party method Foo which sometimes runs in a deadlock for unknown reasons.我们有一个第三方方法Foo ,它有时会因为未知原因而陷入死锁。

We are executing an single-threaded tcp-server and call this method every 30 seconds to check that the external system is available.我们正在执行一个单线程 tcp-server 并每 30 秒调用一次此方法以检查外部系统是否可用。

To mitigate the problem with the deadlock in the third party code we put the ping-call in a Task.Run to so that the server does not deadlock.为了缓解第三方代码中的死锁问题,我们将 ping 调用放在Task.Run中,以便服务器不会死锁。

Like喜欢

async Task<bool> WrappedFoo()
{
    var timeout = 10000; 

    var task = Task.Run(() => ThirdPartyCode.Foo());
    var delay = Task.Delay(timeout);

    if (delay == await Task.WhenAny(delay, task ))
    {
        return false;
    }
    else
    {
        return await task ;
    }
}

But this (in our opinion) has the potential to starve the application of free threads.但这(在我们看来)有可能使自由线程的应用程序匮乏。 Since if one call to ThirdPartyCode.Foo deadlock the thread will never recover from this deadlock and if this happens often enough we might run out of resources.因为如果调用ThirdPartyCode.Foo死锁,线程将永远无法从死锁中恢复,如果这种情况经常发生,我们可能会耗尽资源。

Is there a general approach how one should handle deadlocking third-party code?是否有一种通用方法应该如何处理死锁的第三方代码?

A CancellationToken won't work because the third-party-api does not provide any cancellation options. CancellationToken不起作用,因为第三方 api 不提供任何取消选项。

Update: The method at hand is from the SAPNCO.dll provided by SAP to establish and test rfc-connections to a sap-system, therefore the method is not a simple network-ping.更新:手头的方法来自 SAP 提供的 SAPNCO.dll,用于建立和测试与 sap 系统的 rfc 连接,因此该方法不是简单的网络 ping。 I renamed the method in the question to avoid further misunderstandings我重命名了问题中的方法以避免进一步的误解

Is there a general approach how one should handle deadlocking third-party code?是否有一种通用方法应该如何处理死锁的第三方代码?

Yes, but it's not easy or simple.是的,但这并不容易或简单。

The problem with misbehaving code is that it can not only leak resources (eg, threads), but it can also indefinitely hold onto important resources (eg, some internal "handle" or "lock").行为不端的代码的问题在于它不仅会泄漏资源(例如,线程),而且还可以无限期地持有重要资源(例如,一些内部“句柄”或“锁”)。

The only way to forcefully reclaim threads and other resources is to end the process.强制回收线程和其他资源的唯一方法是结束进程。 The OS is used to cleaning up misbehaving processes and is very good at it.该操作系统用于清理行为不端的进程,并且非常擅长。 So, the solution here is to start a child process to do the API call.因此,这里的解决方案是启动一个子进程来执行 API 调用。 Your main application can communicate with its child process by redirected stdin/stdout, and if the child process ever times out, the main application can terminate it and restart it.您的主应用程序可以通过重定向 stdin/stdout 与其子进程通信,如果子进程超时,主应用程序可以终止它并重新启动它。

This is, unfortunately, the only reliable way to cancel uncancelable code.不幸的是,这是取消不可取消代码的唯一可靠方法。

Your code isn't cancelling the blocked operation.您的代码没有取消被阻止的操作。 Use a CancellationTokenSource and pass a cancellation token to Task.Run instead:使用 CancellationTokenSource 并将取消令牌传递给Task.Run

var cts=new CancellationTokenSource(timeout);

try
{
    await Task.Run(() => ThirdPartyCode.Ping(),cts.Token);
    return true;
}
catch(TaskCancelledException)
{
    return false;
}

It's quite possible that blocking is caused due to networking or DNS issues, not actual deadlock.阻塞很可能是由于网络或 DNS 问题引起的,而不是实际的死锁。

That still wastes a thread waiting for a network operation to complete.这仍然浪费了等待网络操作完成的线程。 You could use .NET's own Ping.SendPingAsync to ping asynchronously and specify a timeout:您可以使用 .NET 自己的Ping.SendPingAsync异步 ping指定超时:

var ping=new Ping();

var reply=await ping.SendPingAsync(ip,timeout);
return reply.Status==IPStatus.Success;

The PingReply class contains far more detailed information than a simple success/failure. PingReply class 包含比简单的成功/失败更详细的信息。 The Status property alone differentiates between routing problems, unreachable destinations, time outs etc Status 属性单独区分路由问题、无法到达的目的地、超时等

Cancelling a task is a collaborative operation in that you pass a CancellationToken to the desired method and externally you use CancellationTokenSource.Cancel :取消任务是一种协作操作,您将CancellationToken传递给所需的方法,并在外部使用CancellationTokenSource.Cancel

public void Caller()
{
     try
     {
          CancellationTokenSource cts=new CancellationTokenSource();
          Task longRunning= Task.Run(()=>CancellableThirdParty(cts.Token),cts.Token);
          Thread.Sleep(3000); //or condition /signal
          cts.Cancel();
     }catch(OperationCancelledException ex)
     {
          //treat somehow
     }
    
}
public void CancellableThirdParty(CancellationToken token)
{
    while(true)
    {
        // token.ThrowIfCancellationRequested()  -- if you  don't treat the cancellation here
        if(token.IsCancellationRequested)
        {
           // code to treat the cancellation signal
           //throw new OperationCancelledException($"[Reason]");
        }
    }
}

As you can see in the code above, in order to cancel an ongoing task, the method running inside it must be structured around the CancellationToken.IsCancellationRequested flag or simply CancellationToken.ThrowIfCancellationRequested method, so that the caller just issues the CancellationTokenSource.Cancel .正如您在上面的代码中看到的,为了取消正在进行的任务,其中运行的方法必须围绕CancellationToken.IsCancellationRequested标志或简单的CancellationToken.ThrowIfCancellationRequested方法构建,以便调用者只需发出CancellationTokenSource.Cancel

Unfortunately if the third party code is not designed around CancellationToken ( it does not accept a CancellationToken parameter ), then there is not much you can do.不幸的是,如果第三方代码不是围绕CancellationToken设计的(它不接受CancellationToken参数),那么您无能为力。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM