简体   繁体   English

如何停止 Azure Durable 函数的并行执行?

[英]How to stop parallel execution on Azure Durable function?

I have an Azure durable function project where I'm working with relatively big CSV files.我有一个 Azure 持久函数项目,我正在处理相对较大的 CSV 文件。 I have two trigger functions:我有两个触发功能:


  • One trigger function needs to get a file from Azure Blob Storage, where that file is over 100MB, and It needs to make smaller chunks and put them in Azure Blob Storage (on different blob directory)一个触发器函数需要从 Azure Blob 存储中获取一个文件,该文件超过 100MB,并且需要制作更小的块并将它们放在 Azure Blob 存储中(在不同的 blob 目录中)

    This is the trigger orchestration function这是触发器编排功能

[FunctionName(nameof(FileChunkerOrchestration))]
public async Task Run(
  [OrchestrationTrigger] IDurableOrchestrationContext context, 
  ILogger log)
{
  var blobName = context.GetInput<string>();  
  var chunkCounter = await context.CallActivityAsync<int>(nameof(FileChunkerActivity), blobName);
}

This is the activity function这是活动功能

[StorageAccount("AzureWebJobsStorage")]
[FunctionName(nameof(FileChunkerActivity))]
public async Task<int> Run(
  [ActivityTrigger]IDurableActivityContext context,
  string fileName,
  [Blob("vessel-container/csv-files/{fileName}.csv", FileAccess.Read)]TextReader blob,
  [Blob("vessel-container/csv-chunks", FileAccess.Write)] CloudBlobContainer container)
{
  // Uses TextReader blob to create chunk files 
  // Then stores chunk by chunk (as soon as one chunk is created then it's being uploaded to Blob storage) in CloudBlobContainer container
}

  • The second trigger function gets triggered on each and every chunks that have been created第二个触发函数在每个创建的块上触发
[StorageAccount("AzureWebJobsStorage")]
[FunctionName(nameof(FileTrigger))]
public async Task Run(
  [DurableClient] IDurableClient starter,  
  [BlobTrigger("vessel-container/csv-chunks/{name}.csv")] Stream blob,
  string name,
  ILogger log)
{
  // Here the processing chunk files start            
}

The problem I'm having is that the second function triggers every and each CSV chunk files and starts to run in parallel which leads my project to use too many available RAM.我遇到的问题是第二个函数触发每个 CSV 块文件并开始并行运行,这导致我的项目使用太多可用 RAM。

I need to fix this in such a manner that my second function (it's orchestration) needs to process file by file.我需要以这样一种方式解决这个问题,即我的第二个功能(它的编排)需要一个文件一个文件地处理。

Please share any idea how to overcome this problem, thanks in advance.请分享如何克服这个问题的任何想法,在此先感谢。

A possible solution to this problem could be a decoupling of the processing by catching the events generated by the creation of the csv chunk files in the Azure Storage via EventGrid and writing them to a storage queue as target (minimal setup and pretty easy).此问题的一个可能解决方案可能是通过 EventGrid 捕获在 Azure 存储中创建 csv 块文件所生成的事件,并将它们作为目标写入存储队列(最小设置且非常简单),从而对处理进行解耦。 The storage queue is than consumed by an Azure Function which is limited in the concurrency via settings in host.json (may collide with other function scale targets in one function app and may require a second function app with dedicated settings)存储队列随后由 Azure 函数使用,该函数通过 host.json 中的设置限制并发(可能与一个函数应用程序中的其他函数规模目标发生冲突,并且可能需要具有专用设置的第二个函数应用程序)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM