简体   繁体   English

我什么时候应该使用关键部分?

[英]When should I use critical sections?

Here's the deal. 这是交易。 My app has a lot of threads that do the same thing - read specific data from huge files(>2gb), parse the data and eventually write to that file. 我的应用程序有很多线程做同样的事情 - 从大文件(> 2gb)读取特定数据,解析数据并最终写入该文件。

Problem is that sometimes it could happen that one thread reads X from file A and second thread writes to X of that same file A. A problem would occur? 问题是,有时可能会发生一个线程从文件A读取X并且第二个线程写入同一文件A的X.会出现问题?

The I/O code uses TFileStream for every file. I / O代码对每个文件使用TFileStream。 I split the I/O code to be local(static class), because I'm afraid there will be a problem. 我将I / O代码拆分为本地(静态类),因为我担心会出现问题。 Since it's split, there should be critical sections. 由于它是分裂的,应该有关键的部分。

Every case below is local(static) code that is not instaniated. 下面的每个案例都是未实例化的本地(静态)代码。

Case 1: 情况1:

procedure Foo(obj:TObject);
begin ... end;

Case 2: 案例2:

procedure Bar(obj:TObject);
var i: integer;
begin
  for i:=0 to X do ...{something}
end;

Case 3: 案例3:

function Foo(obj:TObject; j:Integer):TSomeObject
var i:integer;
begin
  for i:=0 to X do
    for j:=0 to Y do
      Result:={something}
end;

Question 1: In which case do I need critical sections so there are no problems if >1 threads call it at same time? 问题1:在哪种情况下我需要关键部分,以便在> 1个线程同时调用它时没有问题?

Question 2: Will there be a problem if Thread 1 reads X(entry) from file A while Thread 2 writes to X(entry) to file A? 问题2:如果线程1从文件A读取X(条目)而线程2写入X(条目)到文件A,是否会出现问题?

When should I use critical sections? 我什么时候应该使用关键部分? I try to imagine it my head, but it's hard - only one thread :)) 我试着把它想象成我的脑袋,但它很难 - 只有一个线程:))

EDIT 编辑

Is this going to suit it? 这适合它吗?

{a class for every 2GB file} {每2GB文件的一个类}

TSpecificFile = class
  cs: TCriticalSection;
  ...
end;

TFileParser = class
  file :TSpecificFile;
  void Parsethis; void ParseThat....
end;

function Read(file: TSpecificFile): TSomeObject;
begin
  file.cs.Enter;
  try
    ...//read
  finally
    file.cs.Leave;
  end;
end;

function Write(file: TSpecificFile): TSomeObject;
begin
  file.cs.Enter;
  try
    //write
  finally
    file.cs.Leave
  end;
end;

Now will there be a problem if two threads call Read with: 如果两个线程调用Read,则会出现问题:

case 1: same TSpecificFile 案例1:相同的TSpecificFile

case 2: different TSpecificFile? 案例2:不同的TSpecificFile?

Do i need another critical section? 我需要另一个关键部分吗?

In general, you need a locking mechanism (critical sections are a locking mechanism) whenever multiple threads may access a shared resource at the same time, and at least one of the threads will be writing to / modifying the shared resource. 通常,每当多个线程可以同时访问共享资源时,您需要一个锁定机制(关键部分是锁定机制),并且至少有一个线程将写入/修改共享资源。
This is true whether the resource is an object in memory or a file on disk. 无论资源是内存中的对象还是磁盘上的文件,都是如此。
And the reason that the locking is necessary is that, is that if a read operation happens concurrently with a write operation, the read operation is likely to obtain inconsistent data leading to unpredictable behaviour. 并且锁定是必要的原因在于,如果读取操作与写入操作同时发生,则读取操作可能获得导致不可预测行为的不一致数据。
Stephen Cheung has mentioned the platform specific considerations with regards file handling, and I'll not repeat them here. Stephen Cheung在文件处理方面提到了平台特定的考虑因素,我在此不再赘述。

As a side note, I'd like to highlight another concurrency concern that may be applicable in your case. 作为旁注,我想强调可能适用于您的情况的另一个并发问题。

  • Suppose one thread reads some data and starts processing. 假设一个线程读取一些数据并开始处理。
  • Then another thread does the same. 然后另一个线程做同样的事情。
  • Both threads determine that they must write a result to position X of File A. 两个线程都确定它们必须将结果写入文件A的位置X.
  • At best the values to be written are the same, and one of the threads effectively did nothing but waste time. 最好写入的值是相同的,其中一个线程实际上只是浪费时间。
  • At worst, the calculation of one of the threads is overwritten, and the result is lost. 最坏的情况是,其中一个线程的计算被覆盖,结果就会丢失。

You need to determine whether this would be a problem for your application. 您需要确定这是否会对您的应用程序造成问题。 And I must point out that if it is, just locking the read and write operations will not solve it. 我必须指出,如果是这样,只是锁定读写操作将无法解决它。 Furthermore, trying to extend the duration of the locks leads to other problems. 此外,试图延长锁的持续时间会导致其他问题。

Options 选项

Critical Sections 关键部分

Yes, you can use critical sections. 是的,您可以使用关键部分。

  • You will need to choose the best granularity of the critical sections: One per whole file, or perhaps use them to designate specific blocks within a file. 您需要选择关键部分的最佳粒度:每个文件一个,或者可能使用它们来指定文件中的特定块。
  • The decision would require a better understanding of what your application does, so I'm not going to answer for you. 该决定需要更好地了解您的应用程序的功能,因此我不会为您回答。
  • Just be aware of the possibility of deadlocks: 请注意死锁的可能性:
    • Thread 1 acquires lock A 线程1获取锁定A.
    • Thread 2 acquires lock B 线程2获取锁B
    • Thread 1 desires lock B, but has to wait 线程1需要锁定B,但必须等待
    • Thread 2 desires lock A - causing a deadlock because neither thread is able to release its acquired lock. 线程2需要锁定A - 导致死锁,因为两个线程都无法释放其获取的锁定。

I'm also going to suggest 2 other tools for you to consider in your solution. 我还将建议您在解决方案中考虑其他两种工具。

Single-Threaded 单线程

What a shocking thing to say! 多么令人震惊的事情! But seriously, if your reason to go multi-threaded was "to make the application faster", then you went multi-threaded for the wrong reason. 但是说真的,如果你去多线程的理由是“让应用程序更快”,那么你就出于错误的原因去了多线程。 Most people who do that actually end up making their applications, more difficult to write, less reliable, and slower ! 大多数这样做的人实际上最终制作他们的应用程序,更难写,更不可靠, 更慢

It is a far too common misconception that multiple threads speed up applications. 多线程加速应用程序是一个非常普遍的误解。 If a task requires X clock-cycles to perform - it will take X clock-cycles! 如果任务需要X个时钟周期来执行 - 它将需要X个时钟周期! Multiple threads don't speed up tasks, it permits multiple tasks to be done in parallel. 多线程不会加速任务,它允许并行完成多个任务。 But this can be a bad thing ! 但这可能是一件坏事 ... ...

You've described your application as being highly dependent on reading from disk, parsing what's read and writing to disk. 您已经将应用程序描述为高度依赖于从磁盘读取,解析读取和写入磁盘的内容。 Depending on how CPU intensive the parsing step is, you may find that all your threads are spending the majority of their time waiting for disk IO operations. 根据解析步骤的CPU密集程度,您可能会发现所有线程都花费大部分时间等待磁盘IO操作。 In which case, the multiple threads generally only serve to shunt the disk heads to the far 'corners' of your (ummm round ) disk platters. 在这种情况下,多个线程通常仅用于将磁盘头分流到您的(嗯圆形 )磁盘盘的远角。 Disk IO is still the bottle-neck, and the threads make it behave as if the files are maximally fragmented. 磁盘IO仍然是瓶颈,并且线程使其表现得好像文件最大程度地碎片化。

Queueing Operations 排队操作

Let's suppose your reason for going multi-threaded are valid, and you do still have threads operating on shared resources. 让我们假设您进入多线程的原因是有效的,并且您仍然可以在共享资源上运行线程。 Instead of using locks to avoid concurrency issues, you could queue your shared resource operations onto specific threads. 您可以将共享资源操作排队到特定线程,而不是使用锁来避免并发问题。

So instead of Thread 1: 而不是线程1:

  • Reading position X from File A 从文件A中读取位置X.
  • Parsing the data 解析数据
  • Writing to position Y in file A 写入文件A中的位置Y.

Create another thread; 创建另一个线程; the FileA thread: FileA线程:

  • the FileA has a queue of instructions FileA有一个指令队列
  • When it gets to the instruction to read position X, it does so. 当它到达读取位置X的指令时,它就会这样做。
  • It sends the data to Thread 1 它将数据发送到线程1
  • Thread 1 parses its data --- while FileA thread continues processing instructions 线程1解析其数据---而FileA线程继续处理指令
  • Thread 1 places an instruction to write its result to position Y at the back of FileA thread's queue --- while FileA thread continues to process other instructions. 线程1放置一条指令将其结果写入FileA线程队列后面的位置Y,而FileA线程继续处理其他指令。
  • Eventually FileA thread will write the data as required by Trhead 1. 最终,FileA线程将根据Trhead 1的要求写入数据。

Synchronization is only needed for shared data that can cause a problem (or an error) if more than one agent is doing something with it. 只有在多个代理程序正在执行某些操作时才会导致问题(或错误)的共享数据需要同步。

Obviously the file writing operation should be wrapped in a critical section for that file only if you don't want other writer processes to trample on the new data before the write is completed -- the file may no long be consistent if you have half of the new data modified by another process that does not see the other half of the new data (that hasn't been written out by the original writer process yet). 显然,只有当你不希望其他编写器进程在写入完成之前对新数据进行践踏时,文件写入操作才应该包含在该文件的关键部分 - 如果你有一半文件可能不再一致由另一个进程修改的新数据没有看到新数据的另一半(尚未由原始编写者进程写出)。 Therefore you'll have a collection of CS's, one for each file. 因此,您将拥有一个CS集合,每个文件一个。 That CS should be released asap when you're done with writing. 当你完成写作时,应该尽快释放CS。

In certain cases, eg memory-mapped files or sparse files, the O/S may allow you to write to different portions of the file at the same time. 在某些情况下,例如内存映射文件或稀疏文件,O / S 可能允许您同时写入文件的不同部分。 Therefore, in such cases, your CS will have to be on a particular segment of the file. 因此,在这种情况下,您的CS必须位于文件的特定 Thus you'll have a collection of CS's (one for each segment) for each file. 因此,每个文件都有一个CS集合(每个段一个)。

If you write to a file and read it at the same time, the reader may get inconsistent data. 如果您写入文件并同时读取它,则读者可能会得到不一致的数据。 In some O/S's, reading is allowed to happen simultaneously with a write (perhaps the read comes from cached buffers). 在某些操作系统中,允许读取与写入同时发生(可能读取来自缓存的缓冲区)。 However, if you are writing to a file and reading it at the same time, what you read may not be correct. 但是,如果您正在写入文件并同时阅读它,则您阅读的内容可能不正确。 If you need consistent data on reads, then the reader should also be subject to the critical section. 如果您需要有关读取的一致数据,那么读者也应该遵守关键部分。

In certain cases, if you are writing to a segment and read from another segment, the O/S may allow it. 在某些情况下,如果您正在写一个段并从另一个段读取,则O / S可能允许它。 However, whether this will return correct data usually cannot be guaranteed because there you can't always tell whether two segments of the file may be residing in one disk sector, or other low-level O/S things. 但是,这是否会返回正确的数据通常无法保证,因为您无法始终确定文件的两个段是否可能驻留在一个磁盘扇区或其他低级O / S内容中。

So, in general, the advise is to wrap any file operation in a CS, per file. 因此,一般来说,建议是在每个文件的CS中包装任何文件操作。

Theoretically, you should be able to read simultaneously from the same file, but locking it in a CS will only allow one reader. 从理论上讲,您应该能够同时从同一个文件中读取,但将其锁定在CS中只能允许一个读者。 In that case, you'll need to separate your implementation into "read locks" and "write locks" (similar to a database system). 在这种情况下,您需要将实现分为“读锁”和“写锁”(类似于数据库系统)。 This is highly non-trivial though as you'll then have to deal with promoting different levels of locks. 这是非常重要的,因为你必须处理促进不同级别的锁定。

After note: The kind of thing you're trying to data (reading and writing huge data sets that are GB's in size simultaneously in segments) is what is typically done in a database. 注意事项:您尝试数据的方式(在段中同时读取和写入GB大小的数据集)通常在数据库中完成。 You should be looking into breaking your data files into database records. 您应该考虑将数据文件分解为数据库记录。 Otherwise, you either suffer from non-optimized read/write performance due to locking, or you end up re-inventing the relational database. 否则,您要么因锁定而遭受非优化的读/写性能,要么最终重新发明关系数据库。

Conclusion first 结论首先

You don't need TCriticalSection . 您不需要TCriticalSection You should implement a Queue-based algorithm that guarantees no two threads are working on the same piece of data, without blocking. 您应该实现一个基于队列的算法,该算法保证没有两个线程正在处理同一条数据,而不会阻塞。

How I got to that conclusion 我是如何得出这个结论的

First of all Windows (Win 7?) will allow you to simultaneously write to a file as many times as you see fit. 首先, Windows (Win 7?)允许您根据需要同时写入文件。 I have no idea what it does with the writes, and I'm clearly not saying it's a good idea, but I've just done the following test to prove Windows allows simultaneous multiple writes to the same file: 我不知道它对写操作有什么作用,我显然不是说它是个好主意,但我刚刚做了以下测试以证明Windows允许同时多次写入同一个文件:

I made a thread that opens a file for writing (with "share deny none") and keeps writing random stuff to a random offset for 30 seconds. 我创建了一个打开文件进行写入的线程(使用“share deny none”)并继续将随机内容写入随机偏移量30秒。 Here's a pastebin with the code . 这是带代码pastebin

Why a TCriticalSection would be bad 为什么TCriticalSection会很糟糕

A critical section only allows one thread to access the protect resource at any given time. 关键部分仅允许一个线程在任何给定时间访问保护资源。 You have two options: Only hold the lock for the duration of the read/write operation, or hold the lock for the entire time required to process the given resource. 您有两种选择:只在读/写操作期间保持锁定,或者在处理给定资源所需的整个时间内保持锁定。 Both have serious problems. 两者都有严重的问题。

Here's what might happen if a thread holds the lock only for the duration of the read/write operations: 如果线程仅在读/写操作期间持有锁,则可能发生以下情况:

  • Thread 1 acquires the lock, reads the data, releases the lock 线程1获取锁,读取数据,释放锁
  • Thread 2 acquires the lock, reads the same data, releases the lock 线程2获取锁,读取相同的数据,释放锁
  • Thread 1 finishes processing, acquires the lock, writes the data, releases the lock 线程1完成处理,获取锁,写入数据,释放锁
  • Thread 2 acquires the lock, writes the data, and here's the oops : Thread 2 has been working on old data, since Thread 1 made changes in the background! 线程2获取锁,写入数据,这是oops :线程2一直在处理旧数据,因为线程1在后台进行了更改!

Here's what might happen if a thread holds the lock for the entire round-trim read & write operation: 如果一个线程持有整个round-trim读写操作的锁,可能会发生以下情况:

  • Thread 1 acquires the lock, starts reading data 线程1获取锁定,开始读取数据
  • Thread 2 tries to acquire the same lock, gets blocked... 线程2尝试获取相同的锁,被阻止...
  • Thread 1 finishes reading the data, processes the data, writes the data back to file, releases the lock 线程1完成读取数据,处理数据,将数据写回文件,释放锁定
  • Thread 2 acquires the lock and starts processing the same data again ! 线程2获取锁定并再次开始处理相同的数据!

The Queue solution 队列解决方案

Since you're multi-threading, and you can have multiple threads simultaneously processing data from the same file, I assume data is somehow "context free": You can process the 3rd part of a file before processing the 1st. 由于您是多线程的,并且您可以让多个线程同时处理来自同一文件的数据,因此我假设数据以某种方式“无上下文”:您可以在处理第一个文件之前处理文件的第三部分。 This must be true, because if it's not, you can't multi-thread (or are limited to 1 thread per file). 这必须是真的,因为如果不是,则不能多线程(或每个文件限制为1个线程)。

Before you start processing you can prepare a number of "Jobs", that look like this: 在开始处理之前,您可以准备一些“作业”,如下所示:

  • File 'file1.raw', offset 0, 1024 Kb 文件'file1.raw',偏移0,024 Kb
  • File 'file1.raw', offset 1024, 1024 kb. 文件'file1.raw',偏移1024,1024 kb。
  • ... ...
  • File 'fileN.raw', offset 99999999, 1024 kb 文件'fileN.raw',偏移99999999,1024 kb

Put all those "jobs" in a queue. 将所有这些“工作”放入队列中。 Have your threads dequeue one Job from the queue and process it. 让您的线程将一个Job从队列中排队并处理它。 Since no two jobs overlap, threads don't need to synchronize with each other, so you don't need the critical section. 由于没有两个作业重叠,因此线程不需要彼此同步,因此您不需要临界区。 You only need the critical section to protect access to the Queue itself. 您只需要关键部分来保护对Queue本身的访问。 Windows makes sure threads can read and write to/from the files just fine, as long as they stick to the allocated "Job". Windows确保线程可以正常读取和写入文件,只要它们坚持分配的“作业”即可。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM