简体   繁体   中英

Are I/O streams really thread-safe?

I wrote a program that writes random numbers to one file in the first thread, and another thread reads them from there and writes to another file those that are prime numbers. The third thread is needed to stop/start the work. I read that I/O threads are thread-safe. Since writing to a single shared resource is thread-safe, what could be the problem? Output: always correct record in numbers.log , sometimes no record in numbers_prime.log when there are prime numbers, sometimes they are all written.


#include <iostream>
#include <fstream>
#include <thread>
#include <mutex>
#include <vector>
#include <condition_variable>
#include <future>
#include <random>
#include <chrono>
#include <string>


using namespace std::chrono_literals;

std::atomic_int ITER_NUMBERS = 30;
std::atomic_bool _var = false;
bool ret() { return _var; }
std::atomic_bool _var_log = false;
bool ret_log() { return _var_log; }
std::condition_variable cv;
std::condition_variable cv_log;
std::mutex              mtx;
std::mutex mt;
std::atomic<int> count{0};
std::atomic<bool> _FL = 1;
int MIN = 100;
int MAX = 200;

bool is_empty(std::ifstream& pFile) // function that checks if the file is empty
{
    return pFile.peek() == std::ifstream::traits_type::eof();
}


bool isPrime(int n) // function that checks if the number is prime
{
    if (n <= 1)
        return false;
    
    for (int i = 2; i <= sqrt(n); i++)
        if (n % i == 0)
            return false;
    
    return true;
}


void Log(int min, int max) { // function that generates random numbers and writes them to a file numbers.log
    std::string str;
    std::ofstream log;
    std::random_device seed;
    std::mt19937 gen{seed()};
    std::uniform_int_distribution dist{min, max};
    log.open("numbers.log", std::ios_base::trunc);
    for (int i = 0; i < ITER_NUMBERS; ++i, ++count) {
        std::unique_lock<std::mutex> ulm(mtx);
        cv.wait(ulm,ret);
        str = std::to_string(dist(gen)) + '\n';
        log.write(str.c_str(), str.length());
        log.flush();
        _var_log = true;
        cv_log.notify_one();
        //_var_log = false;
        //std::this_thread::sleep_for(std::chrono::microseconds(500000));
        
    }
    log.close();
    _var_log = true;
    cv_log.notify_one();
    _FL = 0;
}




void printCheck() { // Checking function to start/stop printing
    
    std::cout << "Log to file? [y/n]\n";
    while (_FL) {
        char input;
        std::cin >> input;
        std::cin.clear();
        if (input == 'y') {
            _var = true;
            cv.notify_one();
        }
        if (input == 'n') {
            _var = false;
        }
    }
}

void primeLog() { // a function that reads files from numbers.log and writes prime numbers to numbers_prime.log
    std::unique_lock ul(mt);
    int number = 0;
    std::ifstream in("numbers.log");
    std::ofstream out("numbers_prime.log", std::ios_base::trunc);
    if (is_empty(in)) {
        cv_log.wait(ul, ret_log);
    }
    int oldCount{};
    for (int i = 0; i < ITER_NUMBERS; ++i) {
        if (oldCount == count && count != ITER_NUMBERS) { // check if primeLog is faster than Log. If it is faster, then we wait to continue
            cv_log.wait(ul, ret_log);
            _var_log = false;
        }
        if (!in.eof()) {
            in >> number;
            if (isPrime(number)) {
                out << number;
                out << "\n";
            }
            oldCount = count;
        }
    }
}


int main() {
    std::thread t1(printCheck);
    std::thread t2(Log, MIN, MAX);
    std::thread t3(primeLog);
    t1.join();
    t2.join();
    t3.join();
    return 0;
}

This has nothing to do with the I/O stream thread safety. The shown code's logic is broken.

The shown code seems to follow a design pattern of breaking up a single logical algorithm into multiple pieces, and scattering them far and wide. This makes it more difficult to understand what it's doing. So let's rewrite a little bit of it, to make the logic more clear. In primeLog let's do this instead:

            cv_log.wait(ul, []{ return _var_log; });
            _var_log = false;

It's now more clear that this waits for _var_log to be set, before proceeding on its merry way. Once it is it gets immediately reset.

The code that follows reads exactly one number from the file, before looping back here. So, primeLog 's main loop will always handle exactly one number, on each iteration of the loop.

The problem now is very easy to see, once we head over to the other side, and do the same clarification:

        std::unique_lock<std::mutex> ulm(mtx);
        cv.wait(ulm,[]){ return _var; });

        // Code that generates one number and writes it to the file

        _var_log = true;
        cv_log.notify_one();

Once _var is set to true, it remains true. This loops starts running full blast, iterating continuously. On each iteration of the loop it blindly sets _var_log to true and signals the other thread's condition variable.

C++ execution threads are completely independent of each other unless they are explicitly synchronize in some way.

Nothing is preventing this loop from running full blast, getting through its entire number range, before the other execution thread wakes up and decides to read the first number from the file. It'll do that, then go back and wait for its condition variable to be signaled again, for the next number. Its hopes and dreams of the 2nd number will be left unsatisfied.

On each iteration of the generating thread's loop the condition variable, for the other execution thread, gets signaled.

Condition variables are not semaphores. If nothing is waiting on a condition variable when it's signaled -- too bad. When some execution thread decides to wait on a condition variable, it may or may not be immediately woken up.

One of these two execution thread relies on it receiving a condition variable notification for every iteration of its loop.

The logic in the other execution thread fails to implement this guarantee. This may not be the only flaw, there might be others, subject to further analysis, this was just the most apparent logical flaw.

Thanks to those who wrote about read-behind-write, now I know more. But that was not the problem. The main problem was that if it was a new file, when calling pFile.peek() in the is_empty function, we permanently set the file flag to eofbit . Thus, until the end of the program in.rdstate() == std::ios_base::eofbit .

Fix: reset the flag state.

if (is_empty(in)) {
cv_log.wait(ul, ret_log);
}
in.clear(); // reset state

There was also a problem with the peculiarity of reading/writing one file from different threads, though it was not the cause of my program error, but it led to another one.

Because if when I run the program again primeLog() opens std::ifstream in("numbers.log") for reading faster than log.open("numbers.log", std::ios_base::trunc) , then in will save old data into its buffer faster than log.open will erase them with the std::ios_base::trunc flag. Hence we will read and write to numbers_prime.log the old data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM