简体   繁体   English

在 Azure Function App 中限制 Azure 存储队列处理

[英]Throttling Azure Storage Queue processing in Azure Function App

I have created an Azure Function app with an Azure Storage Queue trigger that processes a queue in which each queue item is a URL.我创建了一个带有 Azure 存储队列触发器的 Azure 函数应用程序,该触发器处理一个队列,其中每个队列项都是一个 URL。 The Function just downloads the content of the URL.该函数只是下载 URL 的内容。 I have another function that loads and parses a site's XML Sitemap and adds all the page URLs to the queue.我还有另一个函数可以加载和解析站点的 XML 站点地图并将所有页面 URL 添加到队列中。 The problem I have is that the Functions app runs too quickly and it hammers the website so it starts returning Server Errors.我遇到的问题是 Functions 应用程序运行速度太快,它重创了网站,因此它开始返回服务器错误。 Is there a way to limit/throttle the speed at which the Functions app runs?有没有办法限制/限制 Functions 应用程序的运行速度?

I could, of course, write a simple web job that processed them serially (or with some async but limit the number of concurrent requests), but I really like the simplicity of Azure Functions and wanted to try out "serverless" computing.当然,我可以编写一个简单的 Web 作业来串行处理它们(或使用一些异步但限制并发请求的数量),但我真的很喜欢 Azure Functions 的简单性,并想尝试“无服务器”计算。

There are a few options you can consider.您可以考虑几个选项。

First, there are some knobs that you can configure in host.json that control queue processing (documented here ).首先,您可以在host.json中配置一些控制队列处理的旋钮( 在此处记录)。 The queues.batchSize knob is how many queue messages are fetched at a time. queues.batchSize旋钮是一次获取多少队列消息。 If set to 1, the runtime would fetch 1 message at a time, and only fetch the next when processing for that message is complete.如果设置为 1,运行时将一次获取 1 条消息,并且仅在该消息的处理完成后才获取下一条消息。 This could give you some level of serialization on a single instance .这可以在单个实例上为您提供某种程度的序列化。

Another option might be for you to set the NextVisibleTime on the messages you enqueue in such a way that they are spaced out - by default messages that are enqueued become visible and ready for processing immediately.另一种选择可能是您在入队的消息上设置NextVisibleTime ,以使其间隔开 - 默认情况下,入队的消息变得可见并准备好立即处理。

A final option might be be for you to enqueue a message with the collection of all URLs for a site, rather than one at a time, so when the message is processed, you can process the URLs serially in your function, and limit the parallelism that way.最后一个选择可能是您将一个包含站点所有 URL 的集合的消息排入队列,而不是一次一个,因此在处理消息时,您可以在函数中串行处理 URL,并限制并行度那样。

NextVisibleTime can get messy if there are several parallel functions adding to the queue.如果有多个并行函数添加到队列中, NextVisibleTime可能会变得混乱。 Another simple option for anyone having this problem: Create another queue, "throttled-items", and have your original function follow it for the queue triggers.对于遇到此问题的任何人来说,另一个简单的选择是:创建另一个队列,“限制项目”,并让您的原始函数跟随它来触发队列。 Then, add a simple timer function that moves messages from the original queue every minute, spacing the NextVisibleTime accordingly.然后,添加一个简单的计时器函数,每分钟从原始队列中移动消息,并相应地间隔NextVisibleTime

    [FunctionName("ThrottleQueueItems")]
    public static async Task Run([TimerTrigger("0 * * * * *")] TimerInfo timer, ILogger logger)
    {
        var originalQueue = // get original queue here;
        var throttledQueue = // get throttled queue here;
        var itemsPerMinute = 60; // get from app settings
        var individualDelay = 60.0 / itemsPerMinute;
        var totalRetrieved = 0;
        var maxItemsInBatch = 32; // change if you modify the default queue config
        do
        {
            var pending = (await originalQueue.GetMessagesAsync(Math.Min(maxItemsInBatch, itemsPerMinute - totalRetrieved))).ToArray();
            if (!pending.Any())
                break;
            foreach (var message in pending)
            {
                await throttledQueue.AddMessageAsync(new CloudQueueMessage(message.AsString), null,
                                                                                        TimeSpan.FromSeconds(individualDelay * ++totalRetrieved), null, null);
                await originalQueue.DeleteMessageAsync(message);
            }
        } while (itemsPerMinute > totalRetrieved);
    }

I found this post when trying to solve a similar problem.我在尝试解决类似问题时发现了这篇文章。 This might be useful to anyone that arrives here.这可能对到达这里的任何人都有用。 You can now limit the number of concurrent instances of the function using the WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT app setting.您现在可以使用 WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT 应用程序设置来限制函数的并发实例数。 Setting this to 1 combined with a batch limit of 1 would allow you perform serial processing of a queue.将此设置为 1 并结合批处理限制 1 将允许您执行队列的串行处理。

WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT

The maximum number of instances that the function app can scale out to.函数应用可以扩展到的最大实例数。 Default is no limit.默认是没有限制。

https://docs.microsoft.com/en-gb/azure/azure-functions/functions-app-settings#website_max_dynamic_application_scale_out https://docs.microsoft.com/en-gb/azure/azure-functions/functions-app-settings#website_max_dynamic_application_scale_out

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM