I/O 流真的是线程安全的吗？

Question

我写了一个程序，在第一个线程中将随机数写入一个文件，另一个线程从那里读取它们并将那些是素数的文件写入另一个文件。 需要第三个线程来停止/启动工作。 我读到 I/O 线程是线程安全的。 由于写入单个共享资源是线程安全的，那可能是什么问题？ 输出： numbers.log中的记录总是正确的，当有素数时，有时numbers_prime.log中没有记录，有时它们都被写入。

#include <iostream>
#include <fstream>
#include <thread>
#include <mutex>
#include <vector>
#include <condition_variable>
#include <future>
#include <random>
#include <chrono>
#include <string>


using namespace std::chrono_literals;

std::atomic_int ITER_NUMBERS = 30;
std::atomic_bool _var = false;
bool ret() { return _var; }
std::atomic_bool _var_log = false;
bool ret_log() { return _var_log; }
std::condition_variable cv;
std::condition_variable cv_log;
std::mutex              mtx;
std::mutex mt;
std::atomic<int> count{0};
std::atomic<bool> _FL = 1;
int MIN = 100;
int MAX = 200;

bool is_empty(std::ifstream& pFile) // function that checks if the file is empty
{
    return pFile.peek() == std::ifstream::traits_type::eof();
}


bool isPrime(int n) // function that checks if the number is prime
{
    if (n <= 1)
        return false;
    
    for (int i = 2; i <= sqrt(n); i++)
        if (n % i == 0)
            return false;
    
    return true;
}


void Log(int min, int max) { // function that generates random numbers and writes them to a file numbers.log
    std::string str;
    std::ofstream log;
    std::random_device seed;
    std::mt19937 gen{seed()};
    std::uniform_int_distribution dist{min, max};
    log.open("numbers.log", std::ios_base::trunc);
    for (int i = 0; i < ITER_NUMBERS; ++i, ++count) {
        std::unique_lock<std::mutex> ulm(mtx);
        cv.wait(ulm,ret);
        str = std::to_string(dist(gen)) + '\n';
        log.write(str.c_str(), str.length());
        log.flush();
        _var_log = true;
        cv_log.notify_one();
        //_var_log = false;
        //std::this_thread::sleep_for(std::chrono::microseconds(500000));
        
    }
    log.close();
    _var_log = true;
    cv_log.notify_one();
    _FL = 0;
}




void printCheck() { // Checking function to start/stop printing
    
    std::cout << "Log to file? [y/n]\n";
    while (_FL) {
        char input;
        std::cin >> input;
        std::cin.clear();
        if (input == 'y') {
            _var = true;
            cv.notify_one();
        }
        if (input == 'n') {
            _var = false;
        }
    }
}

void primeLog() { // a function that reads files from numbers.log and writes prime numbers to numbers_prime.log
    std::unique_lock ul(mt);
    int number = 0;
    std::ifstream in("numbers.log");
    std::ofstream out("numbers_prime.log", std::ios_base::trunc);
    if (is_empty(in)) {
        cv_log.wait(ul, ret_log);
    }
    int oldCount{};
    for (int i = 0; i < ITER_NUMBERS; ++i) {
        if (oldCount == count && count != ITER_NUMBERS) { // check if primeLog is faster than Log. If it is faster, then we wait to continue
            cv_log.wait(ul, ret_log);
            _var_log = false;
        }
        if (!in.eof()) {
            in >> number;
            if (isPrime(number)) {
                out << number;
                out << "\n";
            }
            oldCount = count;
        }
    }
}


int main() {
    std::thread t1(printCheck);
    std::thread t2(Log, MIN, MAX);
    std::thread t3(primeLog);
    t1.join();
    t2.join();
    t3.join();
    return 0;
}

Answer 1

这与 I/O 流线程安全无关。 显示的代码的逻辑被破坏了。

显示的代码似乎遵循将单个逻辑算法分解为多个部分并将它们分散到各处的设计模式。 这使得更难理解它在做什么。 所以让我们稍微重写一下，让逻辑更清晰。 在primeLog ，我们改为这样做：

            cv_log.wait(ul, []{ return _var_log; });
            _var_log = false;

现在更清楚的是，这会等待_var_log被设置，然后再继续其愉快的方式。 一旦它被立即重置。

下面的代码从文件中读取一个数字，然后循环回到这里。 因此， primeLog的主循环将在循环的每次迭代中始终处理一个数字。

现在问题很容易看出，一旦我们转向另一边，并做同样的澄清：

        std::unique_lock<std::mutex> ulm(mtx);
        cv.wait(ulm,[]){ return _var; });

        // Code that generates one number and writes it to the file

        _var_log = true;
        cv_log.notify_one();

一旦_var设置为 true，它就保持为 true。 这个循环开始全速运行，不断迭代。 在循环的每次迭代中，它都会盲目地将_var_log设置为 true，并向另一个线程的条件变量发出信号。

C++ 执行线程彼此完全独立，除非它们以某种方式显式同步。

在另一个执行线程唤醒并决定从文件中读取第一个数字之前，没有什么能阻止这个循环完全运行，通过它的整个数字范围。 它会这样做，然后返回并等待其条件变量再次发出信号，等待下一个数字。 它对第二号的希望和梦想将无法满足。

在生成线程的循环的每次迭代中，其他执行线程的条件变量都会收到信号。

条件变量不是信号量。 如果在发出信号时没有任何东西在等待条件变量 - 太糟糕了。 当某个执行线程决定等待条件变量时，它可能会或可能不会立即被唤醒。

这两个执行线程之一依赖于它在其循环的每次迭代中接收条件变量通知。

其他执行线程中的逻辑未能实现此保证。 这可能不是唯一的缺陷，可能还有其他缺陷，有待进一步分析，这只是最明显的逻辑缺陷。

Answer 2

感谢那些写过 read-behind-write 的人，现在我知道了更多。 但这不是问题所在。 主要问题是，如果它是一个新文件，当在is_empty函数中调用pFile.peek()时，我们将文件标志永久设置为eofbit 。 因此，直到程序结束in.rdstate() == std::ios_base::eofbit 。

修复：重置标志状态。

if (is_empty(in)) {
cv_log.wait(ul, ret_log);
}
in.clear(); // reset state

从不同线程读取/写入一个文件的特殊性也存在问题，虽然这不是我的程序错误的原因，但它导致了另一个问题。

因为如果当我再次运行程序时primeLog()打开std::ifstream in("numbers.log")以比log.open("numbers.log", std::ios_base::trunc)更快地读取，那么in将将旧数据保存到其缓冲区中的速度比log.open使用std::ios_base::trunc标志擦除它们的速度更快。 因此，我们将读取和写入numbers_prime.log旧数据。

I/O 流真的是线程安全的吗？

问题描述

2 个解决方案

解决方案1
1 2022-07-04 15:00:02

解决方案2
0 已采纳 2022-07-07 07:11:14

I/O 流真的是线程安全的吗？

问题描述

2 个解决方案

解决方案1 1 2022-07-04 15:00:02

解决方案2 0 已采纳 2022-07-07 07:11:14

解决方案1
1 2022-07-04 15:00:02

解决方案2
0 已采纳 2022-07-07 07:11:14