如何在Ruby中並行處理二進制文件？

Question

我正在嘗試制作一個將二進制文件拆分為塊並上傳的函數

class ChunksClient < ApiStruct::Client
  # Takes the file, splits it into chunks and uploads each chunk into array of urls
  # in corresponding order
  def upload_chunks(big_file, array_of_urls)
    chunk_size = 5242880
    links.each do |link|
      chunk = object.read(chunk_size)
      upload_chunk(chunk, link)
    end
  end

  def upload_chunk(chunk, link)
    put(path: link, body: chunk, headers: { 'Content-type': 'application/octet-stream' })
  end
end

但是，一次做一大塊很慢。 所以我嘗試並行處理它們：

class ChunksClient < ApiStruct::Client
  # Takes the file, splits it into chunks and uploads each chunk into array of urls
  # in corresponding order
  def upload_chunks(big_file, array_of_urls)
    @chunk_size = 5242880
    @index = 0
    @object = object
    threads = []
    links.each do
      threads << Thread.new do
        chunk, index = take_chunk_with_index
        upload_chunk(chunk, links[index])
      end
    end
    threads.each(&:join)
  end

  private

  def upload_chunk(chunk, link)
    put(path: link, body: chunk, headers: { 'Content-type': 'application/octet-stream' })
  end

  def take_chunk_with_index
    index = @index
    chunk = @object.read(@chunk_size)
    @index += 1
    [chunk, index]
  end
end

但它每次都將塊放入隨機鏈接中。 我可以將塊加載到內存中，但這樣上傳大文件（例如，以千兆字節為單位）會遇到問題

是否有使用線程處理二進制文件的正確方法？

Answer 1

您應該像這樣將take_chunk_with_index方法與Mutex同步；

@mutex = Mutex.new

def take_chunk_with_index
  @mutex.synchronize do
    index = @index
    chunk = @object.read(@chunk_size)
    @index += 1
    [chunk, index]
  end
end

如何在Ruby中並行處理二進制文件？

問題描述

1 個解決方案

解決方案1
0 2020-02-20 15:20:41

如何在Ruby中並行處理二進制文件？

問題描述

1 個解決方案

解決方案1 0 2020-02-20 15:20:41

解決方案1
0 2020-02-20 15:20:41