简体   繁体   English

多线程文件

[英]Multi-Threading with files

So let's say I have the following code where I open a file, read the contents line by line and then use each line for a function somewhere else and then when I'm done rewind the file. 所以,假设我有以下代码,我打开一个文件,逐行读取内容,然后在其他地方使用每行代码,然后当我完成文件的倒带时。

FILE *file = Open_File();
char line[max];
while (!EndofFile()) 
{
    int length = GetLength(line);
    if (length > 0) 
    {
       DoStuffToLine(line)
    }
}
rewind(file);

I'm wondering if there is a way to use threads here to add concurrency. 我想知道是否有一种方法可以在这里使用线程来添加并发性。 Since I'm just reading the file and not writing to it I feel like I don't have to worry about race conditioning. 因为我只是在阅读文件而不是写信给我,所以我觉得我不必担心比赛条件。 However I'm not sure how to handle the code that's in the while loop because if one thread is looping over the file and another thread is looping over the file at the same time, would they cause each other to skip over lines, make other errors, etc? 但是我不确定如何处理while循环中的代码,因为如果一个线程在文件上循环而另一个线程同时在文件上循环,它们是否会导致彼此跳过线,使其他错误等? What's a good way to approach this? 有什么好办法来解决这个问题?

If you're trying to do this to improve read performance, you're going to likely be disappointed since this will almost surely be disk I/O bound. 如果您正在尝试这样做以提高读取性能,那么您可能会感到失望,因为这几乎肯定会受到磁盘I / O限制。 Adding more threads won't help the OS and disk controller fetch data any faster. 添加更多线程无助于操作系统和磁盘控制器更快地获取数据。

However, if you're trying to just process the data in parallel, that's another matter. 但是,如果您尝试并行处理数据,那就另当别论了。 In that case, I would read the entire file into a memory buffer somewhere, then have your threads process it in parallel. 在这种情况下,我会将整个文件读入某个内存缓冲区,然后让你的线程并行处理它。 That way you don't have to worry about thread safety with rewinding the file pointer or any other annoying issues like it. 这样你就不用担心线程安全,而是要重新调整文件指针或任何其他烦人的问题。

You'll likely still need to use other locking mechanisms for the multithreaded parts of course, depending on exactly what you're doing, but you shouldn't have to worry about what the standard library is going to do when you start accessing a file with multiple threads. 当然,您可能仍需要为多线程部分使用其他锁定机制,具体取决于您正在做什么,但您不必担心开始访问文件时标准库将要执行的操作有多个线程。

The concurrency adds some race condition problems: 并发性增加了一些竞争条件问题:

1. The EndofFile() function is evaluated at the start of the loop, it may always happens that this function returns true for two threads, then one thread reaches the end of file and the other thread attempts to read the file.You never know when a thread may be in execution; 1.在循环开始时评估EndofFile()函数,可能总是发生这个函数为两个线程返回true,然后一个线程到达文件末尾而另一个线程尝试读取文件。你永远不知道当线程可能正在执行时;
2. Same is valid for the GetLength function: when a thread has the length information, the length may change because another thread may read another line; 2.同样对GetLength函数有效:当一个线程有长度信息时,长度可能会改变,因为另一个线程可能读取另一行;
3. You are reading a file sequentially, even if you rewind it, it may always occur that the current position of the IO pointer is altered by some other thread. 3.您正在按顺序读取文件,即使您回放它,也可能始终发生IO指针的当前位置被其他某个线程更改。

Furthermore, as Telgin pointed out, reading a file is not CPU bound, but I/O bound, so is the system to read the file.You can't improve the performance because you need some locks, and locking to guarantee thread safety just introduces overhead. 此外,正如Telgin指出的那样,读取文件不是CPU绑定的,而是I / O绑定,系统读取文件也是如此。由于需要一些锁定而无法提高性能,并且锁定以保证线程安全引入开销。

I'm not sure that this is the best approach. 我不确定这是最好的方法。 However, you could read the file. 但是,您可以阅读该文件。 Then store it in two separate objects and read the objects instead of the file. 然后将其存储在两个单独的对象中,并读取对象而不是文件。 Just make sure to do cleanup afterward. 确保事后做好清理工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM