简体   繁体   English

是否应该从同一DataInputStream读取多个线程?

[英]Should multiple threads read from the same DataInputStream?

I'd like my program to get a file, and then create 4 files based on its byte content. 我希望我的程序获取一个文件,然后根据其字节内容创建4个文件。

Working with only the main thread, I just create one DataInputStream and do my thing sequentially. 仅使用主线程,我只创建一个DataInputStream并按顺序执行我的任务。

Now, I'm interested in making my program concurrent. 现在,我对使程序并发很感兴趣。 Maybe I can have four threads - one for each file to be created. 也许我可以有四个线程-每个要创建的文件一个。

I don't want to read the file's bytes into memory all at once, so my threads will need to query the DataInputStream constantly to stream the bytes using read() . 我不想一次将文件的字节全部读入内存,因此我的线程将需要不断查询DataInputStream以使用read()流传输字节。

What is not clear to me is, should my 4 threads call read() on the same DataInputStream , or should each one have their own separate stream to read from? 我不清楚的是,我的4个线程应该在同一个DataInputStream上调用read() ,还是每个线程都应该有自己单独的流来读取?

I don't think this is a good idea. 我认为这不是一个好主意。 See http://download.java.net/jdk7/archive/b123/docs/api/java/io/DataInputStream.html 参见http://download.java.net/jdk7/archive/b123/docs/api/java/io/DataInputStream.html

DataInputStream is not necessarily safe for multithreaded access. DataInputStream对于多线程访问不一定是安全的。 Thread safety is optional and is the responsibility of users of methods in this class. 线程安全是可选的,并且是此类中用户的责任。

Assuming you want all of the data in each of your four new files, each thread should create its own DataInputStream. 假设您要在四个新文件中的每个文件中包含所有数据,则每个线程应创建自己的DataInputStream。

If the threads share a single DataInputStream, at best each thread will get some random quarter of the data. 如果线程共享单个DataInputStream,则每个线程充其量只能获得随机的四分之一数据。 At worst, you'll get a crash or data corruption due to multithreaded access to code that is not thread safe. 最糟糕的是,由于对线程安全的代码进行多线程访问,您会崩溃或损坏数据。

If you want to read data from 1 file into 4 separate ones you will not share DataInputStream. 如果要将数据从1个文件读取到4个单独的文件中,则不会共享DataInputStream。 You can however wrap that stream and add functionality that would make it thread safe. 但是,您可以包装该流并添加使该线程安全的功能。

For example you may want to read in a chunk of data from your DataInputStream and cache that small chunk. 例如,您可能想从DataInputStream中读取一大块数据并缓存该小块。 When all 4 threads have read the chunk you can dispose of it and continue reading. 当所有4个线程都读取了块后,您可以将其处置并继续读取。 You would never have to load the complete file into memory. 您将不必将整个文件加载到内存中。 You would only have to load a small amount. 您只需要加载少量即可。

If you look at the doc of DataInputStream. 如果您查看DataInputStream的文档。 It is a FilterInputStream, which means the read operation is delegated to other inputStream. 它是FilterInputStream,这意味着读取操作将委托给其他inputStream。 Suppose you use here is a FileInputStream, In most platform, concurrent read will be supported. 假设您在这里使用的是FileInputStream,在大多数平台上,将支持并发读取。

So in your case, you should initialize four different FileInputStream, result in four DataInputStream, used in four thread separately. 因此,在您的情况下,您应该初始化四个不同的FileInputStream,导致四个DataInputStream,分别在四个线程中使用。 The read operation will not be interfered. 读取操作不会受到干扰。

Short answer is no. 简短的答案是没有。

Longer answer: have a single thread read the DataInputStream, and put the data into one of four Queues, one per output file. 更长的答案:让一个线程读取DataInputStream,并将数据放入四个队列之一,每个输出文件一个。 Decide which Queue based upon the byte content. 根据字节内容确定哪个队列。

Have four threads, each one reading from a Queue, that write to the output files. 有四个线程,每个线程从一个队列中读取,并写入输出文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM