My Rust code uses RwLock
to process data in multiple threads. Each thread fills a common storage while using the read
lock (eg filling up a database, but my case is a bit different). Eventually, the common storage will fill up. I need to pause all processing, reallocate storage space (eg allocate more disk space from cloud), and continue.
// psudo-code
fn thread_worker(tasks) {
let lock = rwlock.read().unwrap();
for task in tasks {
// please ignore out_of_space check race condition
// it's here just to explain the question
if out_of_space {
drop(lock);
let write_lock = rwlock.write().unwrap();
// get more storage
drop(write_lock);
lock = rwlock.read().unwrap();
}
// handle task WITHOUT getting a read lock on every pass
// getting a lock is far costlier than actual task processing
}
drop(lock);
}
Since all threads will quickly hit out of space at about the same time, they can all release the read
lock, and get a write
. The first thread that gets the write
lock will fix the storage issue. But now I have a possible temporary deadlock situation - all other threads are also waiting for the write
lock even though they no longer need it.
So it is possible for this situation to happen: given 3 threads all waiting for write
, the 1st gets the write
, fixes the issue, releases write
, and waits for read
. The 2nd enters write
but quickly skips because issue already fixed and releases. The 1st and 2nd threads will enter read
and continue processing, but the 3rd is still waiting for write
and will wait for it for a very long time until the first two either run out of space or finish all their work.
Given all threads waiting for write
, how can I "abort" all other thread's waits from the first thread after it finishes its work, but before it releases the write
lock it already got?
I saw there is a poisoning
feature, but that was designed for panics, and reusing it for production seems wrong and tricky to get done correctly. Also Rust devs are thinking of removing it.
PS Each loop iteration is essentially a data[index] = value
assignment, where data
is a giant memmap shared by many threads. The index
is slowly growing in all threads, so eventually all threads run out of memmap size. When that happens, memmap is destroyed, file reallocated, and a new memmap is created. Thus, it is impossible to get a read lock on every loop iteration.
You could have an AtomicBool
that serves as gatekeeper for writing: only one thread gets to attempt to write()
at one time. The trick is that other threads don't even attempt to write()
, they just fall back to read()
. So instead of "aborting" the other writers, you prevent them from initiating the write()
to begin with.
For example, assuming that the "naive" implementation looks like this ( playground ):
use parking_lot::RwLock;
pub struct Data<T> {
store: RwLock<T>,
}
impl<T: Default> Data<T> {
pub fn process(&self, needed_size: usize, f: impl FnOnce(&T)) {
let mut store = self.store.read();
if Self::needs_resize(&store, needed_size) {
drop(store);
let mut wstore = self.store.write();
Self::resize(&mut wstore, needed_size);
drop(wstore);
store = self.store.read();
}
f(&store)
}
fn needs_resize(_store: &T, _needed_size: usize) -> bool {
unimplemented!()
}
fn resize(store: &mut T, to_size: usize) {
if !Self::needs_resize(store, to_size) {
return; // someone else reserved enough for us
}
unimplemented!()
}
}
The implementation that avoids unnecessary writes might then look like this:
use parking_lot::RwLock;
use std::sync::atomic::{AtomicBool, Ordering};
pub struct Data<T> {
store: RwLock<T>,
has_writer: AtomicBool,
}
impl<T: Default> Data<T> {
pub fn process(&self, needed_size: usize, f: impl FnOnce(&T)) {
loop {
let mut store = self.store.read();
if Self::needs_resize(&store, needed_size) {
drop(store);
if self.has_writer.swap(true, Ordering::SeqCst) {
continue; // someone else is writing, go back to read
}
let mut wstore = self.store.write();
Self::resize(&mut wstore, needed_size);
drop(wstore);
self.has_writer.store(false, Ordering::SeqCst);
let mut wstore = self.store.write();
Self::resize(&mut wstore, needed_size);
drop(wstore);
store = self.store.read();
}
break f(&store);
}
}
// needs_resize() and resize() the same stubs as before
...
}
Assuming that the tasks are independent and relatively short, the easiest way to solve this is to not hold a read lock for the whole batch of work, but unlock it for each task:
// psudo-code
fn thread_worker(tasks) {
for task in tasks {
if out_of_space {
let write_lock = rwlock.write().unwrap();
// get more storage
...
}
// process task
let lock = rwlock.read().unwrap();
...
}
}
Note how this avoids explicit drops which is usually a good practice ( RAII ). The drop is automatic at the end of the usage block.
This still has this issue, but only for a very short period of time, especially if the number of tasks is much bigger than the number of workers.
To make it even more robust, you might get inspiration from std::sync::Barrier and implement something similar with a CondVar and a counter.
PS Consider an alternative design where instead of workers handling the reallocation, it is performed by a separate dedicated management task. When out_of_space condition is detected, you send a signal to that space manager (using a CondVar or an mpsc channel), and that performs the job of reallocation while the workers sleep and check out_of_space periodically.
PSS It is not clear what the read() lock protects from your description (I assumed you need it somehow). If you are fine with running processing and requesting storage at the same time, then a plain mutex also does the job:
// psudo-code
fn thread_worker(tasks) {
for task in tasks {
if out_of_space {
let write_lock = mutex.lock().unwrap();
// get more storage
...
}
// process task
...
}
}
First note that depending on your target platform, your code may already work as is. For example for platforms where Rust threads rely on libpthread (eg Linux), and any platform where write locks take precedence over read locks.
If you want a cross-platform solution, all you need to do is switch to parking-lot
which provides a fair implementation of a RwLock
. In particular this means thatreaders trying to acquire the lock will block even if the lock is unlocked when there are writers waiting to acquire the lock .
Here's the sequence of events with a fair RwLock
:
Note that there is a theoretical race condition that could keep one of the thread blocked once memory has been allocated if the other threads are able to proceed in the time it takes to release the read lock and request the write lock, eg:
drop(lock);
// Another thread gets the write lock, allocates memory and releases the lock
// All the other threads acquire and release the write lock
// At least one other thread acquires the read lock
let write_lock = rwlock.write().unwrap();
Given the time it takes to allocate memory alone, the probability of this happening in real life is so vanishingly small that it can be discounted.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.