Does calling `into_inner()` on an atomic take into account all the relaxed writes?

Question

Does into_inner() return all the relaxed writes in this example program? If so, which concept guarantees this?

extern crate crossbeam;

use std::sync::atomic::{AtomicUsize, Ordering};

fn main() {
    let thread_count = 10;
    let increments_per_thread = 100000;
    let i = AtomicUsize::new(0);

    crossbeam::scope(|scope| {
        for _ in 0..thread_count {
            scope.spawn(|| {
                for _ in 0..increments_per_thread {
                    i.fetch_add(1, Ordering::Relaxed);
                }
            });
        }
    });

    println!(
        "Result of {}*{} increments: {}",
        thread_count,
        increments_per_thread,
        i.into_inner()
    );
}

( https://play.rust-lang.org/?gist=96f49f8eb31a6788b970cf20ec94f800&version=stable )

I understand that crossbeam guarantees that all threads are finished and since the ownership goes back to the main thread, I also understand that there will be no outstanding borrows, but the way I see it, there could still be outstanding pending writes, if not on the CPUs, then in the caches.

Which concept guarantees that all writes are finished and all caches are synced back to the main thread when into_inner() is called? Is it possible to lose writes?

Answer 1

Does into_inner() return all the relaxed writes in this example program? If so, which concept guarantees this?

It's not into_inner that guarantees it, it's join .

What into_inner guarantees is that either some synchronization has been performed since the final concurrent write ( join of thread, last Arc having been dropped and unwrapped with try_unwrap , etc.), or the atomic was never sent to another thread in the first place. Either case is sufficient to make the read data-race-free.

Crossbeam documentation is explicit about using join at the end of a scope:

This [the thread being guaranteed to terminate] is ensured by having the parent thread join on the child thread before the scope exits.

Regarding losing writes:

Which concept guarantees that all writes are finished and all caches are synced back to the main thread when into_inner() is called? Is it possible to lose writes?

As stated in various places in the documentation, Rust inherits the C++ memory model for atomics. In C++11 and later, the completion of a thread synchronizes with the corresponding successful return from join . This means that by the time join completes, all actions performed by the joined thread must be visible to the thread that called join , so it is not possible to lose writes in this scenario.

In terms of atomics, you can think of a join as an acquire read of an atomic that the thread performed a release store on just before it finished executing.

Answer 2

I will include this answer as a potential complement to the other two.

The kind of inconsistency that was mentioned, namely whether some writes could be missing before the final reading of the counter, is not possible here. It would have been undefined behaviour if writes to a value could be postponed until after its consumption with into_inner . However, there are no unexpected race conditions in this program, even without the counter being consumed with into_inner , and even without the help of crossbeam scopes.

Let us write a new version of the program without crossbeam scopes and where the counter is not consumed ( Playground ):

let thread_count = 10;
let increments_per_thread = 100000;
let i = Arc::new(AtomicUsize::new(0));
let threads: Vec<_> = (0..thread_count)
    .map(|_| {
        let i = i.clone();
        thread::spawn(move || for _ in 0..increments_per_thread {
            i.fetch_add(1, Ordering::Relaxed);
        })
    })
    .collect();

for t in threads {
    t.join().unwrap();
}

println!(
    "Result of {}*{} increments: {}",
    thread_count,
    increments_per_thread,
    i.load(Ordering::Relaxed)
);

This version still works pretty well! Why? Because a synchronizes-with relation is established between the ending thread and its corresponding join . And so, as well explained in a separate answer , all actions performed by the joined thread must be visible to the caller thread.

One could probably also wonder whether even the relaxed memory ordering constraint is sufficient to guarantee that the full program behaves as expected. This part is addressed by the Rust Nomicon , emphasis mine:

Relaxed accesses are the absolute weakest. They can be freely re-ordered and provide no happens-before relationship. Still, relaxed operations are still atomic . That is, they don't count as data accesses and any read-modify-write operations done to them occur atomically. Relaxed operations are appropriate for things that you definitely want to happen, but don't particularly otherwise care about. For instance, incrementing a counter can be safely done by multiple threads using a relaxed fetch_add if you're not using the counter to synchronize any other accesses.

The mentioned use case is exactly what we are doing here. Each thread is not required to observe the incremented counter in order to make decisions, and yet all operations are atomic. In the end, the thread join s synchronize with the main thread, thus implying a happens-before relation, and guaranteeing that the operations are made visible there. As Rust adopts the same memory model as C++11's (this is implemented by LLVM internally), we can see regarding the C++ std::thread::join function that "The completion of the thread identified by *this synchronizes with the corresponding successful return" . In fact, the very same example in C++ is available in cppreference.com as part of the explanation on the relaxed memory order constraint:

#include <vector>
#include <iostream>
#include <thread>
#include <atomic>

std::atomic<int> cnt = {0};

void f()
{
    for (int n = 0; n < 1000; ++n) {
        cnt.fetch_add(1, std::memory_order_relaxed);
    }
}

int main()
{
    std::vector<std::thread> v;
    for (int n = 0; n < 10; ++n) {
        v.emplace_back(f);
    }
    for (auto& t : v) {
        t.join();
    }
    std::cout << "Final counter value is " << cnt << '\n';
}

Answer 3

The fact that you can call into_inner (which consumes the AtomicUsize ) means that there are no more borrows on that backing storage.

Each fetch_add is an atomic with the Relaxed ordering, so once the threads are complete there shouldn't be any thing that changes it (if so, then there's a bug in crossbeam).

See the description on into_inner for more info

Does calling `into_inner()` on an atomic take into account all the relaxed writes?

Question

3 answers

solution1
5 ACCPTED 2017-10-16 19:05:34

solution2
1 2017-10-17 08:51:02

solution3
0 2017-10-16 17:48:04

Does calling `into_inner()` on an atomic take into account all the relaxed writes?

Question

3 answers

solution1 5 ACCPTED 2017-10-16 19:05:34

solution2 1 2017-10-17 08:51:02

solution3 0 2017-10-16 17:48:04

solution1
5 ACCPTED 2017-10-16 19:05:34

solution2
1 2017-10-17 08:51:02

solution3
0 2017-10-16 17:48:04