简体   繁体   中英

Early stop of multiple RwLock::write waiting in Rust

My Rust code uses RwLock to process data in multiple threads. Each thread fills a common storage while using the read lock (eg filling up a database, but my case is a bit different). Eventually, the common storage will fill up. I need to pause all processing, reallocate storage space (eg allocate more disk space from cloud), and continue.

// psudo-code
fn thread_worker(tasks) {
  let lock = rwlock.read().unwrap();
  for task in tasks {
    // please ignore out_of_space check race condition
    // it's here just to explain the question 
    if out_of_space {
      drop(lock);
      let write_lock = rwlock.write().unwrap();
      // get more storage
      drop(write_lock);
      lock = rwlock.read().unwrap();
    }
    // handle task WITHOUT getting a read lock on every pass
    // getting a lock is far costlier than actual task processing
  }
  drop(lock);
}

Since all threads will quickly hit out of space at about the same time, they can all release the read lock, and get a write . The first thread that gets the write lock will fix the storage issue. But now I have a possible temporary deadlock situation - all other threads are also waiting for the write lock even though they no longer need it.

So it is possible for this situation to happen: given 3 threads all waiting for write , the 1st gets the write , fixes the issue, releases write , and waits for read . The 2nd enters write but quickly skips because issue already fixed and releases. The 1st and 2nd threads will enter read and continue processing, but the 3rd is still waiting for write and will wait for it for a very long time until the first two either run out of space or finish all their work.

Given all threads waiting for write , how can I "abort" all other thread's waits from the first thread after it finishes its work, but before it releases the write lock it already got?

I saw there is a poisoning feature, but that was designed for panics, and reusing it for production seems wrong and tricky to get done correctly. Also Rust devs are thinking of removing it.

PS Each loop iteration is essentially a data[index] = value assignment, where data is a giant memmap shared by many threads. The index is slowly growing in all threads, so eventually all threads run out of memmap size. When that happens, memmap is destroyed, file reallocated, and a new memmap is created. Thus, it is impossible to get a read lock on every loop iteration.

You could have an AtomicBool that serves as gatekeeper for writing: only one thread gets to attempt to write() at one time. The trick is that other threads don't even attempt to write() , they just fall back to read() . So instead of "aborting" the other writers, you prevent them from initiating the write() to begin with.

For example, assuming that the "naive" implementation looks like this ( playground ):

use parking_lot::RwLock;

pub struct Data<T> {
    store: RwLock<T>,
}

impl<T: Default> Data<T> {
    pub fn process(&self, needed_size: usize, f: impl FnOnce(&T)) {
        let mut store = self.store.read();
        if Self::needs_resize(&store, needed_size) {
            drop(store);
            let mut wstore = self.store.write();
            Self::resize(&mut wstore, needed_size);
            drop(wstore);
            store = self.store.read();
        }
        f(&store)
    }
    
    fn needs_resize(_store: &T, _needed_size: usize) -> bool {
        unimplemented!()
    }
    fn resize(store: &mut T, to_size: usize) {
        if !Self::needs_resize(store, to_size) {
            return; // someone else reserved enough for us
        }
        unimplemented!()
    }
}

The implementation that avoids unnecessary writes might then look like this:

use parking_lot::RwLock;
use std::sync::atomic::{AtomicBool, Ordering};

pub struct Data<T> {
    store: RwLock<T>,
    has_writer: AtomicBool,
}

impl<T: Default> Data<T> {
    pub fn process(&self, needed_size: usize, f: impl FnOnce(&T)) {
        loop {
            let mut store = self.store.read();
            if Self::needs_resize(&store, needed_size) {
                drop(store);
                if self.has_writer.swap(true, Ordering::SeqCst) {
                    continue; // someone else is writing, go back to read
                }
                let mut wstore = self.store.write();
                Self::resize(&mut wstore, needed_size);
                drop(wstore);
                self.has_writer.store(false, Ordering::SeqCst);
                let mut wstore = self.store.write();
                Self::resize(&mut wstore, needed_size);
                drop(wstore);
                store = self.store.read();
            }
            break f(&store);
        }
    }
    // needs_resize() and resize() the same stubs as before
    ...
}

Playground

Assuming that the tasks are independent and relatively short, the easiest way to solve this is to not hold a read lock for the whole batch of work, but unlock it for each task:

// psudo-code
fn thread_worker(tasks) {
  for task in tasks {
    if out_of_space {
      let write_lock = rwlock.write().unwrap();
      // get more storage
      ...
    }

    // process task
    let lock = rwlock.read().unwrap();
    ...
  }
}

Note how this avoids explicit drops which is usually a good practice ( RAII ). The drop is automatic at the end of the usage block.

This still has this issue, but only for a very short period of time, especially if the number of tasks is much bigger than the number of workers.

To make it even more robust, you might get inspiration from std::sync::Barrier and implement something similar with a CondVar and a counter.

PS Consider an alternative design where instead of workers handling the reallocation, it is performed by a separate dedicated management task. When out_of_space condition is detected, you send a signal to that space manager (using a CondVar or an mpsc channel), and that performs the job of reallocation while the workers sleep and check out_of_space periodically.

PSS It is not clear what the read() lock protects from your description (I assumed you need it somehow). If you are fine with running processing and requesting storage at the same time, then a plain mutex also does the job:

// psudo-code
fn thread_worker(tasks) {
  for task in tasks {
    if out_of_space {
      let write_lock = mutex.lock().unwrap();
      // get more storage
      ...
    }
    // process task
    ...
  }
}

First note that depending on your target platform, your code may already work as is. For example for platforms where Rust threads rely on libpthread (eg Linux), and any platform where write locks take precedence over read locks.

If you want a cross-platform solution, all you need to do is switch to parking-lot which provides a fair implementation of a RwLock . In particular this means thatreaders trying to acquire the lock will block even if the lock is unlocked when there are writers waiting to acquire the lock .

Here's the sequence of events with a fair RwLock :

  • Initially all threads are running and hold the read lock.
  • First thread to run out of space releases the read lock and requests the write lock. Since the other threads still hold the read lock, the first thread is blocked.
  • One after the other, the other threads run out of space, release the read lock and request the write lock.
  • Once all the threads have released the read lock, one of them acquires the write lock.
  • The thread that got the write lock allocates more memory, releases the write lock and requests the read lock. Since the other threads waiting on the write lock take precedence, the read request blocks.
  • One after the other, the other threads acquire the write lock, notice that there is memory available, release the write lock and request the read lock.
  • Once all the threads have acquired and released the write lock, they all acquire the read lock and proceed.

Note that there is a theoretical race condition that could keep one of the thread blocked once memory has been allocated if the other threads are able to proceed in the time it takes to release the read lock and request the write lock, eg:

drop(lock);
// Another thread gets the write lock, allocates memory and releases the lock
// All the other threads acquire and release the write lock
// At least one other thread acquires the read lock
let write_lock = rwlock.write().unwrap();

Given the time it takes to allocate memory alone, the probability of this happening in real life is so vanishingly small that it can be discounted.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM