简体   繁体   English

从磁盘读取和并行处理

[英]Reading from disk and processing in parallel

This is going to be the most basic and even may be stupid question here. 这将是最基本的,甚至可能是一个愚蠢的问题。 When we talk about using multi threading for better resource utilization. 当我们谈论使用多线程以获得更好的资源利用率时。 For example, an application reads and processes files from the local file system. 例如,应用程序从本地文件系统读取和处理文件。 Lets say that reading of file from disk takes 5 seconds and processing it takes 2 seconds. 让我们说从磁盘读取文件需要5秒,处理它需要2秒。

In above scenario, we say that using two threads one to read and other to process will save time. 在上面的场景中,我们说使用两个线程一个读取和另一个处理将节省时间。 Because even when one thread is processing first file, other thread in parallel can start reading second file. 因为即使一个线程正在处理第一个文件,其他并行线程也可以开始读取第二个文件。

Question: Is this because of the way CPUs are designed. 问:这是因为CPU的设计方式。 As in there is a different processing unit and different read/write unit so these two threads can work in parallel on even a single core machine as they are actually handled by different modules? 由于存在不同的处理单元和不同的读/写单元,因此这两个线程甚至可以在单个核心机器上并行工作,因为它们实际上由不同的模块处理? Or this needs multiple core. 或者这需要多个核心。

Sorry for being stupid. 抱歉是傻瓜。 :) :)

In theory yes. 理论上是的。 Single core has same parallelism. 单核具有相同的并行性。 One thread waiting for read from file (I/O Wait), another thread is process file that already read before. 一个线程等待从文件读取(I / O等待),另一个线程是之前已读取的进程文件。 First thread actually can not running state until I/O operations is completed. 在I / O操作完成之前,第一个线程实际上无法运行状态。 Rougly not use cpu resource at this state. Rougly在这种状态下不使用cpu资源。 Second thread consume CPU resource and complete task. 第二个线程消耗CPU资源并完成任务。 Indeed, multi core CPU has better performance. 的确,多核CPU具有更好的性能。

To start with, there is a difference between concurrency and parallelism . 首先, 并发性和并行性之间存在差异 Theoretically, a single core machine does not support parallelism. 从理论上讲,单核机器不支持并行性。

About the question on performance improvement as a result of concurrency (using threads), it is very implementation dependent. 关于由于并发(使用线程)而导致的性能改进问题,它非常依赖于实现。 Take for instance, Android or Swing. 以Android或Swing为例。 Both of them have a main thread (or the UI thread). 它们都有一个主线程(或UI线程)。 Doing large calculation on the main thread will block the UI and make in unresponsive. 在主线程上执行大型计算将阻止UI并使其无响应。 So from a layman perspective that would be a bad performance. 所以从一个外行的角度来看,这将是一个糟糕的表现。

In your case(I am assuming there is no UI Thread) where you will benefit from delegating your processing to another thread depends on a lot of factors, specially the implementation of your threads. 在你的情况下(我假设没有UI线程)你将受益于将处理委托给另一个线程取决于很多因素,特别是你的线程的实现。 eg Synchronized threads would not be as good as the unsynchronized ones. 例如,同步线程不如未同步线程好。 Your problem statement reminds me of classic consumer producer problem. 你的问题陈述让我想起了经典的消费者生产者问题。 So use of threads should not really be the best thing for your work as you need synchronized threads. 因此,当您需要同步线程时,使用线程不应该是您工作的最佳选择。 IMO It's better to do all the reading and processing in a single thread. IMO最好在一个线程中完成所有的读取和处理。

Multithreading will also have a context switching cost. 多线程还将具有上下文切换成本。 It is not as big as Process's context switching, but it's still there. 它没有Process的上下文切换那么大,但它仍然存在。 See this link . 看到这个链接

[EDIT] You should preferably be using BlockingQueue for such producer consumer scenario. [编辑]您最好使用BlockingQueue作为此类生产者消费者场景。

On a single processor, multithreading is achieved through time slicing. 在单个处理器上,通过时间切片实现多线程。 One thread will do some work then it will switch to the other thread. 一个线程将执行一些工作,然后它将切换到另一个线程。

When a thread is waiting on some I/O, such as a file read, it will give up it's CPU time-slice prematurely allowing another thread to make use of the CPU. 当一个线程正在等待某些I / O(例如文件读取)时,它会过早地放弃它的CPU时间片,从而允许另一个线程利用CPU。

The result is overall improved throughput compared to a single thread even on a single core. 结果是即使在单个核上,与单个线程相比总体上提高了吞吐量。

Key for below: 以下关键:

  • = Doing work on CPU =在CPU上工作
  • - I/O - I / O.
  • _ Idle _空闲

Single thread: 单线程:

====--====--====--====--

Two threads: 两个线程:

====--__====--__====--__
____====--__====--__====

So you can see how more can get done in the same time as the CPU is kept busy where it would have been kept waiting before. 因此,您可以看到如何在CPU保持忙碌的同时完成更多工作,以便在此之前保持等待。 The storage device is also being used more. 存储设备也在使用更多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM