从多个线程使用时的 CopyToAsync 怪异行为

Question

I have the following function to write to a file asynchronously from multiple threads in parallel->我有以下 function 从并行的多个线程异步写入文件->

static startOffset = 0; // This variable will store the offset at which the thread begins to write
static int blockSize = 10; // size of block written by each thread
static Task<long> WriteToFile(Stream dataToWrite)
{
   var startOffset= getStartfOffset(); // Definition of this function is given later
   using(var fs = new FileStream(fileName,
                FileMode.OpenOrCreate,
                FileAccess.ReadWrite,
                FileShare.ReadWrite))
  {
     fs.Seek(offset,SeekOrigin.Begin); 
     await dataToWrite.CopyToAsync(fs); 
  }
  return startOffset;
} 

/**
*I use reader writer lock here so that only one thread can access the value of the startOffset at 
*a time
*/
static int getStartOffset()
{
  int result = 0;
  try
 {
   rwl.AcquireWriterLock();
   result = startOffset; 
   startOffset+=blockSize; // increment the startOffset for the next thread 
 }
 finally
 {
  rwl.ReleaseWriterLock(); 
 } 
 return result; 
}

I then access the above function using to write some strings from multiple threads.然后我访问上面的 function 用于从多个线程写入一些字符串。

var tasks = List<Task>(); 
for(int i=1;i<=4;i++)
{
   tasks.Add(Task.Run( async() => {
      String s = "aaaaaaaaaa" 
      byte[] buffer = new byte [10]; 
      buffer = Encoding.Default.GetBytes(s); 
      Stream data = new MemoryStream(buffer); 
      long offset = await WriteToFile(data);  
      Console.WriteLine($"Data written at offset - {offset}"); 
   }); 
}

Task.WaitAll(tasks.ToArray());

Now, this code executes well most of the times.现在，这段代码在大多数情况下都能很好地执行。 But sometimes randomly, it write some Japanese characters or some other symbols in the file.但有时会随机地在文件中写入一些日文字符或其他符号。 Is there something that I am doing wrong in the multithreading?我在多线程中做错了什么吗？

Answer 1

Your calculation of startOffset assumes that each thread is writing exactly 10 bytes.您对startOffset的计算假定每个线程正好写入 10 个字节。 There are several issues with this.这有几个问题。

One, the data has unknown length:一、数据长度未知：

  byte[] buffer = new byte [10]; 
  buffer = Encoding.Default.GetBytes(s);

The assignment doesn't put data into the newly allocated 10 byte array, it leaks the new byte[10] array (which will be garbage collected) and stores a reference to the return of GetBytes(s) , which could have any length at all.该赋值不会将数据放入新分配的 10 字节数组中，它会泄漏new byte[10]数组（将被垃圾收集）并存储对GetBytes(s)返回值的引用，该引用的长度可以为任意长度全部。 It could overflow into the next Task's area.它可能会溢出到下一个任务的区域。 Or it could leave some content that existed in the file beforehand (you use OpenOrCreate ) which lies in the area for the current Task, but past the end of the actual dataToWrite .或者它可能会保留一些预先存在于文件中的内容（您使用OpenOrCreate ），这些内容位于当前任务的区域中，但超出了实际dataToWrite的末尾。

Two, you try to seek past the areas that other threads are expected to write to, but if those writes haven't completed, they haven't increased the file length.第二，您尝试寻找其他线程预期写入的区域，但如果这些写入尚未完成，则它们没有增加文件长度。 So you attempt to seek past the end of the file, which is allowed for the Windows API but might cause problems with the .NET wrappers.因此，您尝试查找文件末尾，这对于 Windows API 是允许的，但可能会导致 .NET 包装器出现问题。 However, FileStream.Seek does indicate you are ok但是， FileStream.Seek确实表明你没问题

When you seek beyond the length of the file, the file size grows当您搜索超出文件长度时，文件大小会增加

although this might not be precisely correct, since the Windows API says虽然这可能不完全正确，因为 Windows API 说

It is not an error to set a file pointer to a position beyond the end of the file.将文件指针设置为超出文件末尾的 position 不是错误。 The size of the file does not increase until you call the SetEndOfFile , WriteFile , or WriteFileEx function. A write operation increases the size of the file to the file pointer position plus the size of the buffer written, which results in the intervening bytes uninitialized.在您调用SetEndOfFile 、 WriteFile或WriteFileEx function 之前，文件的大小不会增加。写入操作会将文件的大小增加到文件指针 position 加上写入的缓冲区的大小，这会导致中间字节未初始化。

Answer 2

I think that asynchronous file I/O is not usually meant to be utilized with multithreading.我认为异步文件 I/O 通常不适合用于多线程。 Just because something is asynchronous does not mean that an operation should have multiple threads assigned to it.仅仅因为某些东西是异步的并不意味着一个操作应该有多个线程分配给它。

To quote the documentation for async file I/O: Asynchronous operations enable you to perform resource-intensive I/O operations without blocking the main thread .引用异步文件 I/O 的文档：异步操作使您能够在不阻塞主线程的情况下执行资源密集型 I/O 操作。 Basically, instead of using a bunch of threads on one operation, it dispatches a new thread to accomplish a less meaningful task.基本上，它不是在一个操作上使用一堆线程，而是分派一个新线程来完成一个意义不大的任务。 Eventually with a big enough application, nearly everything can be abstracted to be a not-so-meaningful task and computers can run massive apps pretty quickly utilizing multithreading.最终有了一个足够大的应用程序，几乎所有的东西都可以被抽象成一个不那么有意义的任务，并且计算机可以利用多线程非常快速地运行大量应用程序。

What you are likely experiencing is undefined behavior due to multiple threads overwriting the same location in memory. These Japanese characters you are referring to are likely malformed ascii/unicode that your text editor is attempting to interpret.您可能遇到的是由于多个线程覆盖 memory 中的同一位置而导致的未定义行为。您所指的这些日语字符可能是您的文本编辑器试图解释的格式错误的 ascii/unicode。

If you would like to remedy the undefined behavior and remain using asynchronous operations, you should be able to await each individual task before the next one can start.如果您想补救未定义的行为并继续使用异步操作，您应该能够在下一个任务开始之前await每个单独的任务。 This will prevent the offset variable from being in the incorrect position for the newest task.这将防止偏移量变量在最新任务的错误 position 中。 Although, logically it will run the same as a synchronous version.虽然，从逻辑上讲，它会像同步版本一样运行。

从多个线程使用时的 CopyToAsync 怪异行为

问题描述

2 个解决方案

解决方案1
1 2022-06-14 19:01:26

解决方案2
0 2022-06-14 18:52:02

从多个线程使用时的 CopyToAsync 怪异行为

问题描述

2 个解决方案

解决方案1 1 2022-06-14 19:01:26

解决方案2 0 2022-06-14 18:52:02

解决方案1
1 2022-06-14 19:01:26

解决方案2
0 2022-06-14 18:52:02