简体   繁体   English

如果在Set()之后立即调用Reset(),则ManualResetEvent.WaitOne()不会返回

[英]ManualResetEvent.WaitOne() doesn't return if Reset() is called immediately after Set()

I have a problem in a production service which contains a "watchdog" timer used to check whether the main processing job has become frozen (this is related to a COM interop problem which unfortunately can't be reproduced in test). 我在生产服务中遇到一个问题,它包含一个“看门狗”计时器,用于检查主处理作业是否已经冻结(这与COM互操作问题有关,遗憾的是在测试中无法再现)。

Here's how it currently works: 以下是它目前的工作原理:

  • During processing, the main thread resets a ManualResetEvent , processes a single item (this shouldn't take long), then sets the event. 在处理过程中,主线程重置一个ManualResetEvent ,处理一个项目(这不应该花费很长时间),然后设置事件。 It then continues to process any remaining items. 然后它继续处理任何剩余的项目。
  • Every 5 minutes, the watchdog calls WaitOne(TimeSpan.FromMinutes(5)) on this event. 每隔5分钟,看门狗就此事件调用WaitOne(TimeSpan.FromMinutes(5)) If the result is false, the service is restarted. 如果结果为false,则重新启动服务。
  • Sometimes, during normal operation, the service is being restarted by this watchdog even though processing takes nowhere near 5 minutes. 有时,在正常操作期间,该监视器正在重新启动该服务,即使处理过程不会接近5分钟。

The cause appears to be that when multiple items await processing, the time between the Set() after the first item is processed, and the Reset() before the second item is processed is too brief, and WaitOne() doesn't appear to recognise that the event has been set. 原因似乎是当多个项目等待处理时,处理第一个项目后Set()和处理第二个项目之前的Reset()之间的时间太短,而WaitOne()似乎没有认识到事件已经设定。

My understanding of WaitOne() is that the blocked thread is guaranteed to receive a signal when Set() is called , but I assume I'm missing something important. 我对WaitOne()理解是, 当调用Set() ,被阻塞的线程可以保证接收到一个信号 ,但我想我错过了一些重要的东西。

Note that if I allow a context switch by calling Thread.Sleep(0) after calling Set() , WaitOne() never fails. 请注意,如果我在调用Set()之后通过调用Thread.Sleep(0)允许上下文切换,那么WaitOne()永远不会失败。

Included below is a sample which produces the same behaviour as my production code. 下面包含一个与我的生产代码产生相同行为的示例。 WaitOne() sometimes waits for 5 seconds and fails, even though Set() is being called every 800 milliseconds . WaitOne()有时会等待5秒并失败, 即使每隔800毫秒调用一次Set()也是如此

private static ManualResetEvent _handle;

private static void Main(string[] args)
{
    _handle = new ManualResetEvent(true);

    ((Action) PeriodicWait).BeginInvoke(null, null);
    ((Action) PeriodicSignal).BeginInvoke(null, null);

    Console.ReadLine();
}

private static void PeriodicWait()
{
    Stopwatch stopwatch = new Stopwatch();

    while (true)
    {
        stopwatch.Restart();
        bool result = _handle.WaitOne(5000, false);
        stopwatch.Stop();
        Console.WriteLine("After WaitOne: {0}. Waited for {1}ms", result ? "success" : "failure",
                            stopwatch.ElapsedMilliseconds);
        SpinWait.SpinUntil(() => false, 1000);
    }
}

private static void PeriodicSignal()
{
    while (true)
    {
        _handle.Reset();
        Console.WriteLine("After Reset");
        SpinWait.SpinUntil(() => false, 800);
        _handle.Set();
        // Uncommenting either of the lines below prevents the problem
        //Console.WriteLine("After Set");
        //Thread.Sleep(0);
    }
}

输出上面的代码


The Question 问题

While I understand that calling Set() closely followed by Reset() doesn't guarantee that all blocked threads will resume, is it also not guaranteed that any waiting threads will be released? 虽然我明白调用Set()后紧跟Reset()并不能保证所有被阻塞的线程都会恢复,是否也不保证会释放任何等待的线程?

No, this is fundamentally broken code. 不,这是从根本上打破的代码。 There are only reasonable odds that the WaitOne() will complete when you keep the MRE set for such a short amount of time. 当你将MRE设置这么短的时间时,WaitOne()只有合理的赔率才能完成。 Windows favors releasing a thread that's blocked on an event. Windows赞成释放在事件中被阻止的线程。 But this will drastically fail when the thread isn't waiting. 但是当线程没有等待时,这将彻底失败。 Or the scheduler picks another thread instead, one that runs with a higher priority and also got unblocked. 或者调度程序选择另一个线程,一个以更高优先级运行并且也被解除阻塞的线程。 Could be a kernel thread for example. 例如,可以是内核线程。 MRE doesn't keep a "memory" of having been signaled and not yet waited on. MRE没有保留已经发出信号而尚未等待的“记忆”。

Neither Sleep(0) or Sleep(1) are good enough to guarantee that the wait is going to complete, there's no reasonable upper bound on how often the waiting thread could be bypassed by the scheduler. Sleep(0)或Sleep(1)都不足以保证等待完成,调度程序绕过等待线程的频率没有合理的上限。 Although you probably ought to shut down the program when it takes longer than 10 seconds ;) 虽然你需要在超过10秒的时间内关闭程序;)

You'll need to do this differently. 你需要以不同的方式做到这一点。 A simple way is to rely on the worker to eventually set the event. 一种简单的方法是依靠worker最终设置事件。 So reset it before you start waiting: 所以在开始等待之前重置它:

private static void PeriodicWait() {
    Stopwatch stopwatch = new Stopwatch();

    while (true) {
        stopwatch.Restart();
        _handle.Reset();
        bool result = _handle.WaitOne(5000);
        stopwatch.Stop();
        Console.WriteLine("After WaitOne: {0}. Waited for {1}ms", result ? "success" : "failure",
                            stopwatch.ElapsedMilliseconds);
    }
}

private static void PeriodicSignal() {
    while (true) {
        _handle.Set();
        Thread.Sleep(800);   // Simulate work
    }
}

You can't "pulse" an OS event like this. 你不能像这样“脉动”一个OS事件。

Among other issues, there's the fact that any OS thread performing a blocking wait on an OS handle can be temporarily interrupted by a kernel-mode APC; 在其他问题中,事实上,在OS句柄上执行阻塞等待的任何OS线程都可以被内核模式APC临时中断; when the APC finishes, the thread resumes waiting. 当APC完成时,线程重新开始等待。 If the pulse happened during that interruption, the thread doesn't see it. 如果在中断期间发生脉冲,则线程看不到它。 This is just one example of how "pulses" can be missed (described in detail in Concurrent Programming on Windows , page 231). 这只是如何错过“脉冲”的一个例子( 在Windows上的并发编程 ,第231页中有详细描述)。

BTW, this does mean that the PulseEvent Win32 API is completely broken . 顺便说一句,这确实意味着PulseEvent Win32 API 完全被破坏了

In a .NET environment with managed threads, there's even more possibility of missing a pulse. 在具有托管线程的.NET环境中,更有可能丢失脉冲。 Garbage collection, etc. 垃圾收集等

In your case, I would consider switching to an AutoResetEvent which is repeatedly Set by the working process and (automatically) reset by the watchdog process each time its Wait completes. 在你的情况下,我会考虑切换到一个AutoResetEvent ,它由工作进程重复Set ,并在每次Wait完成时由看门狗进程(自动)重置。 And you'd probably want to "tame" the watchdog by only having it check every minute or so. 并且你可能想要通过每分钟检查一次来“驯服”看门狗。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM