简体   繁体   English

如何在C ++中的线程中正确处理永久挂起的第三方库调用?

[英]How do I correctly handle a permanently hung third-party library call in a thread in C++?

I have a device which has an library. 我有一个有图书馆的设备。 Some of its functions are most awesomely ill-behaved, in the "occasionally hang forever" sense. 它的一些功能最令人敬畏,在“偶尔挂起”的意义上。

I have a program which uses this device. 我有一个使用此设备的程序。 If/when it hangs, I need to be able to recover gracefully and reset it. 如果/何时挂起,我需要能够优雅地恢复并重置它。 The offending calls should return within milliseconds and are being called in a loop many many times per second. 有问题的调用应该在几毫秒内返回,并且每秒循环调用很多次。

My first question is: when a thread running the recalcitrant function hangs, what do I do? 我的第一个问题是:当运行顽抗函数的线程挂起时, 我该怎么办? Even if I litter the thread with interruption points, this happens: 即使我用中断点乱丢线程,也会发生这种情况:

boost::this_thread::interruption_point(); // irrelevant, in the past
deviceLibrary.thatFunction(); // <-- hangs here forever
boost::this_thread::interruption_point(); // never gets here!

The only word I've read on what to do there is to modify the function itself, but that's out of the question for a variety of reasons -- not least of which is "this is already miles outside of my skill set". 我在这里读到的唯一一个词是修改功能本身,但出于各种原因这是不可能的 - 尤其是“这已经超出了我的技能范围”。

I have tried asynchronous launching with C++11 futures: 我尝试过用C ++ 11期货进行异步启动:

// this was in a looping thread -- it does not work: wait_for sometimes never returns
std::future<void> future = std::async(std::launch::async, 
    [this] () { deviceLibrary.thatFunction(*data_ptr); }); 
if (future.wait_for(std::chrono::seconds(timeout)) == std::future_status::timeout) { 
    printf("no one will ever read this\n"); 
    deviceLibrary.reset(); // this would work if it ever got here
}

No dice, in that or a number of variations. 没有骰子,在那个或多个变化。

I am now trying boost::asio with a thread_group of a number of worker threads running io_service::run() . 我现在尝试使用一个运行io_service::run()的多个工作线程的thread_group尝试boost::asio It works magnificently until the second time it times out. 它非常有效,直到第二次超时。 Then I've run out of threads, because each hanging thread eats up one of my thread_group and it never comes back ever. 然后我的线程用完了,因为每个挂起的线程吃掉了我的一个thread_group ,它永远不会回来。

My latest idea is to call work_threads.create_thread to make a new thread to replace the now-hanging one. 我的最新想法是调用work_threads.create_thread来创建一个新线程来替换现在挂起的线程。 So my second question is: if this is a viable way of dealing with this, how should I cope with the slowly amassing group of hung threads? 所以我的第二个问题是:如果这是一个可行的方法来解决这个问题,我应该如何应对缓慢积累的一组鸿线? How do I remove them? 我该如何删除它们? Is it fine to leave them there? 把它们留在那里可以吗?

Incidentally, I should mention that there is in fact a version of deviceLibrary.thatFunction() that has a timeout. 顺便提一下,我应该提一下,实际上有一个版本的deviceLibrary.thatFunction()有一个超时。 It doesn't. 它没有。

I found this answer but it's C# and Windows specific, and this one which seems relevant. 我找到了这个答案,但它是C#和Windows特定的, 这个似乎是相关的。 But I'm not so sure about 但我不太确定 spawning hundreds of extra processes a second 每秒产生数百个额外的进程 (edit: oh right; I could banish all the calls to one or two separate processes. If they communicate well enough and I can share the device between them. Hm...) (编辑:哦,对了;我可以放弃一个或两个单独进程的所有调用。如果他们通信得很好,我可以在他们之间共享设备。嗯...)

Pertinent background information: I'm using MSVC 2013 on Windows 7, but the code has to cross-compile for ARM on Debian with GCC 4.6 also. 相关背景信息:我在Windows 7上使用MSVC 2013,但代码必须在Debian上使用GCC 4.6进行ARM交叉编译。 My level of C++ knowledge is... well... if it seems like I'm missing something obvious, I probably am. 我的C ++知识水平是......好吧......如果看起来我错过了一些明显的东西,我可能就是这样。

Thanks! 谢谢!

If you want to reliably kill something that's out of your control and may hang, use a separate process. 如果您想要可靠地杀死一些不受控制并可能挂起的东西,请使用单独的过程。

While process isolation was once considered to be very 'heavy-handed', browsers like Chrome today will implement it on a per-tab basis. 虽然过程隔离一度被认为是非常“严厉”,但今天Chrome这样的浏览器将基于每个标签实现它。 Each tab gets a process, the GUI has a process, and if the tab rendering dies it doesn't take down the whole browser. 每个选项卡都有一个进程,GUI有一个进程,如果选项卡渲染消失,它不会占用整个浏览器。

How can Google Chrome isolate tabs into separate processes while looking like a single application? Google Chrome如何在单个应用程序中将标签隔离到单独的进程中?

Threads are simply not designed for letting a codebase defend itself from ill-behaved libraries. 线程根本不是为让代码库保护自己免受不良行为库而设计的。 Processes are. 流程是。

So define the services you need, put that all in one program using your flaky libraries, and use interprocess communication from your main app to speak with the bridge. 因此,定义您需要的服务,使用您的片状库将所有服务放在一个程序中,并使用主应用程序的进程间通信来与桥接器通信。 If the bridge times out or has a problem due to the flakiness, kill it and restart it. 如果桥由于剥落而超时或出现问题,请将其杀死并重新启动。

I am only going to answer this part of your text: when a thread running the recalcitrant function hangs, what do I do? 我只会回答你文本的这一部分:当一个运行顽抗功能的线程挂起时,我该怎么办?

A thread could invoke inline machine instructions. 线程可以调用内联机器指令。 These instructions might clear the interrupt flag. 这些指令可能会清除中断标志。 This may cause the code to be non interruptible. 这可能导致代码不可中断。 As long as it does not decide to return, you cannot force it to return. 只要它不决定返回,你就不能强迫它返回。 You might be able to force it to die (eg kill the process containing the thread), but you cannot force the code to return. 您可能可以强制它死(例如,杀死包含该线程的进程),但是您无法强制返回代码。

I hope my answer convinces you that the answer recommending to use a bridge process is in fact what you should do. 我希望我的回答让您相信,建议使用桥接过程的答案实际上是您应该做的。

The first thing you do is make sure that it's the library that's buggy. 你要做的第一件事就是确保它是错误的库。 Then you create a minimal example that demonstrates the problem (if possible), and send a bug report and the example to the library's developer. 然后,您创建一个演示问题的最小示例(如果可能),并将错误报告和示例发送给库的开发人员。 Lastly, you cross your fingers and wait. 最后,你交叉手指等待。

What you don't do is put your fingers in your ears and say "LALALALALA" while you hide the problem behind layers of crud in an attempt to pretend the problem is gone. 你不做的是把你的手指放在你的耳朵里说“LALALALALA”,同时你把问题隐藏在层层背后,试图假装问题消失了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我应该如何在Win32 C ++应用程序中集成和打包此第三方库? - How should I integrate with and package this third-party library in a Win32 C++ app? 如何在 cmake 中构建参数化的第三方库? - How do I build a parameterized third-party library in cmake? 如何在 Visual Studio C++ 中使用第三方 DLL 文件? - How do I use a third-party DLL file in Visual Studio C++? 如何使第三方库对Boost线程使用是线程安全的? - How can I make a third-party library thread-safe for use with Boost threads? 如何从C++调用Python代码(包括导入的第三方模块)? - How to call Python code (including third-party modules imported) from C++? 我在哪里放置第三方库来设置C ++ Linux开发环境? - Where do I put third-party libraries to set up a C++ Linux development environment? 应该在我的C ++库的API中公开第三方类型 - Should third-party types be exposed in my C++ library's API 为c++构建一个内部依赖第三方库的静态库 - Build a static library for c++ which is internally dependent on third-party libraries 如何#include在C ++ / CLI项目中使用nullptr的第三方本机C ++标头 - how to #include a third-party native C++ header that uses nullptr in a C++/CLI project 第三方库怎么能做标准库做不到的事情呢? - How can third-party libraries do things that the standard library can't do?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM