简体   繁体   中英

Can boost::mutex lock out an OS if enough are active?

I'm working on a producer consumer problem with an intermediate processing thread. When I run 200 of these applications it locks the system up in win7 when lots of connections timeout. Unfortunately, not in a way that I know how to debug it. The system becomes unresponsive and I have to restart it with the power button. It works fine on my mac, and oddly enough, it works fine in windows in safe mode.

I'm using boost 1.44 as that is what the host application uses.

Here is my queue. My intention is that the queues are synchronized on their size. I've manipulated this to use timed_wait to make sure I wasn't losing notifications, though I saw no difference in effect.

class ConcurrentQueue {
public:
    void push(const std::string& str, size_t notify_size, size_t max_size);
    std::string pop();

private:
    std::queue<std::string> queue;
    boost::mutex mutex;
    boost::condition_variable cond;
};

void ConcurrentQueue::push(
  const std::string& str, size_t notify_size, size_t max_size) {
    size_t queue_size;
    {{
        boost::mutex::scoped_lock lock(mutex);
        if (queue.size() < max_size) {
            queue.push(str);
        }
        queue_size = queue.size();
    }}
    if (queue_size >= notify_size)
        cond.notify_one();
}

std::string ConcurrentQueue::pop() {
    boost::mutex::scoped_lock lock(mutex);
    while (!queue.size())
        cond.wait(lock);
    std::string str = queue.front();
    queue.pop();
    return str;
}

These threads use the below queues to process and send using libcurl.

boost::shared_ptr<ConcurrentQueue> queue_a(new ConcurrentQueue);
boost::shared_ptr<ConcurrentQueue> queue_b(new ConcurrentQueue);

void prod_run(size_t iterations) {
    try {
        // stagger startup
        boost::this_thread::sleep(
              boost::posix_time::seconds(random_num(0, 25)));
        size_t save_frequency = random_num(41, 97);
        for (size_t i = 0; i < iterations; i++) {
            // compute
            size_t v = 1;
            for (size_t j = 2; j < (i % 7890) + 4567; j++) {
                v *= j;
                v = std::max(v % 39484, v % 85783);
            }
            // save
            if (i % save_frequency == 0) {
                std::string iv = 
                              boost::str( boost::format("%1%=%2%") % i % v );
                queue_a->push(iv, 1, 200);
            }
            sleep_frame();
        }
    } catch (boost::thread_interrupted&) {
    }
}

void prodcons_run() {
    try {
        for (;;) {
            std::string iv = queue_a->pop();
            queue_b->push(iv, 1, 200);
        }
    } catch (boost::thread_interrupted&) {
    }
}

void cons_run() {
    try {
        for (;;) {
            std::string iv = queue_b->pop();
            send_http_post("http://127.0.0.1", iv);
        }
    } catch (boost::thread_interrupted&) {
    }
}

My understanding of using mutexes in this way should not make a system unresponsive. If anything, my apps would deadlock and sleep forever.

Is there some way that having 200 of these at once creates a scenario where this isn't the case?

Update:

When I restart the computer, most of the time I need to replug in the USB keyboard to get it to respond. Given the driver comment, I thought that may be relevant. I tried updating the northbridge drivers, though they were up to date. I'll look to see if there are other drivers that need attention.

Update:

I've watched memory, non-paged pool, cpu, handles, ports and none of them are at alarming rates at any time while the system is responsive. It's possible something spikes at the end, though that is not visible to me.

Update:

When the system hangs, it stops rendering and does not respond to the keyboard. The last frame that it rendered stays up though. The system sounds like it is still running and when the system comes back up there is nothing in the event viewer saying that it crashed. There are no crash dump files either. I interpret this as the OS is being blocked out of execution.

A mutex lock locks other applications that use the same lock. Any mutex used by the OS should not be (directly) available to any application.

Of course, if the mutex is implemented using the OS in some way, it may well call into the OS, and thus using CPU resources. However, a mutex lock should not cause any worse behaviour than the application using CPU resources any other way.

It may of course be that if you use locks in an inappropriate way, different parts of the application becomes deadlocked, as function 1 acquires lock A, and then function 2 acquires lock B. If then function 1 tries to acquire lock B, and function 2 tries to acquire lock A before releasing their respective locks, you have a deadlock. The trick here is to always acquire multiple locks in the same order. So if you need two locks at the same time, always acquire lock A first, then lock B.

Deadlockin should not affect the OS as such - if anything, it makes it better, but if the application is in some way "misbehaving" in case of a deadlock, it may cause problems by calling the OS a lot - eg if the locking is done by:

while (!trylock(lock))
{
      /// do nothing here
}

it may cause peaks in system usage.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM