简体   繁体   English

Microsoft Azure 中托管的 .NET CORE API 应用程序超时异常(Redis 缓存、SQL、.NET Core)

[英].NET CORE API application timeout exception hosted in Microsoft Azure (Redis cache, SQL, .NET Core)

I have the following infrastructure: .NET Core 3.1 API, which is hosted in VNet.我有以下基础结构:.NET Core 3.1 API,它托管在 VNet 中。 Inside of VNet we have 8 servers with load balancer + SQL Server + Redis Cache.在 VNet 内部,我们有 8 台带有负载均衡器 + SQL Server + Redis 缓存的服务器。

We are running the API Load test 1200 operation per second on login operation (which is not a lightweight operation).我们在登录操作(这不是轻量级操作)上运行 API 负载测试每秒 1200 次操作。 At this moment the load on all servers is 5-10%.此时所有服务器上的负载为 5-10%。 But the problem is we're getting API timeout and Redis timeout issues.但问题是我们遇到了 API 超时和 Redis 超时问题。

It seems like something is blocking our threads好像有什么东西阻塞了我们的线程

This is from my Startup.cs (we're trying to play with the value, but no success):这是来自我的 Startup.cs(我们正在尝试使用该值,但没有成功):

  var threadCount = 2000; 
  ThreadPool.GetMaxThreads(out _, out var completionThreads); 
  ThreadPool.SetMinThreads(threadCount, completionThreads);

This is from *.csproj file:这是来自 *.csproj 文件:

 <PropertyGroup>
<ThreadPoolMinThreads>315</ThreadPoolMinThreads>

Update1-> Redis issue information is added Update1-> Redis 问题信息添加

Redis error: StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 10008ms elapsed, timeout is 10000ms), command=GET, next: SET key_digievents____freeevent_4072, inst: 0, qu: 0, qs: 684, aw: False, rs: ReadAsync, ws: Idle, in: 2197285, in-pipe: 0, out-pipe: 0, serverEndpoint: 10.0.0.34:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: akssocial27apiapp-xkkb4, IOCP: (Busy=0,Free=1000,Min=1000,Max=1000), WORKER: (Busy=430,Free=32337,Min=315,Max=32767), v: 2.1.58.34321 StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 10008ms elapsed, timeout is 10000ms), command=GET, next: SET key_digievents____freeevent_4072, inst: 0, qu: 0, qs: 684, aw: False, rs: ReadAsync, ws: Idle, in: 2197285, in-pipe: 0, out-pipe: 0, serverEndpoint: 10.0.0.34:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: akssocial27apiapp-xkkb4, IOCP: (Busy=0,Free=1000,Min=1000,Max=1000), WORKER: (Busy=430,Free=32337,Min=315,M Redis 错误:StackExchange.Redis.RedisTimeoutException:等待响应超时(出站=0KiB,入站=0KiB,10008 毫秒已过,超时为 10000 毫秒),命令=GET,下一个:SET key_digievents____freeevent_4072,inst:0,qu:0,qs: aw: False, rs: ReadAsync, ws: Idle, in: 2197285, in-pipe: 0, out-pipe: 0, serverEndpoint: 10.0.0.34:6379, mc: 1/1/0, mgr: 10 of 10 available , clientName: akssocial27apiapp-xkkb4, IOCP: (Busy=0,Free=1000,Min=1000,Max=1000), WORKER: (Busy=430,Free=32337,Min=315,Max=32767), v: 2.1 .58.34321 StackExchange.Redis.RedisTimeoutException:等待响应超时(出站=0KiB,入站=0KiB,10008ms已过,超时为10000ms),命令=GET,下一个:SET key_digievents____freeevent_4072,inst:08,qsaw: : False, rs: ReadAsync, ws: Idle, in: 2197285, in-pipe: 0, out-pipe: 0, serverEndpoint: 10.0.0.34:6379, mc: 1/1/0, mgr: 10 of 10 available,客户名称:akssocial27apiapp-xkkb4,IOCP:(忙碌=0,空闲=1000,最小值=1000,最大值=1000),工人:(忙碌=430,空闲=32337,最小值=315,M ax=32767), v: 2.1.58.34321 at Datadog.Trace.ClrProfiler.Integrations.StackExchange.Redis.ConnectionMultiplexer.ExecuteAsyncImplInternal[T](Object multiplexer, Object message, Object processor, Object state, Object server, Func`6 originalMethod) ax=32767), v: 2.1.58.34321 在 Datadog.Trace.ClrProfiler.Integrations.StackExchange.Redis.ConnectionMultiplexer.ExecuteAsyncImplInternal[T](对象多路复用器,对象消息,对象处理器,对象状态,对象服务器,Func`6 originalMethod)

I will be glad for any advice.我会很高兴的任何建议。 Thanks in advance.提前致谢。

Many people meet TimeoutException when they upgrade to 2.x很多人在升级到2.x时遇到TimeoutException

https://github.com/StackExchange/StackExchange.Redis/issues/1226 https://github.com/StackExchange/StackExchange.Redis/issues/1226

this solution might help you: Are you seeing a high number of busyio or busyworker threads in the timeout exception?此解决方案可能对您有所帮助: 您是否在超时异常中看到大量的 busyio 或 busyworker 线程?

At the end of the post, it says:在帖子的最后,它说:

In .Net Core, add Environment Variable COMPlus_ThreadPool_ForceMinWorkerThreads to overwrite default MinThreads setting, according to Environment/Registry Configuration Knobs - You can also use the same ThreadPool.SetMinThreads() Method as described above.在 .Net Core 中,根据环境/注册表配置旋钮,添加环境变量 COMPlus_ThreadPool_ForceMinWorkerThreads 以覆盖默认的 MinThreads 设置 - 您也可以使用与上述相同的 ThreadPool.SetMinThreads() 方法。

Below is the text I enclose my notes on the question I asked.下面是我附上关于我提出的问题的笔记的文字。 I hope this helps someone and saves a lot of time.我希望这可以帮助某人并节省大量时间。

First of all, we had no exceptions/errors/reports that bandwidth is a bottleneck in Azure infrastructure.首先,我们没有异常/错误/报告带宽是 Azure 基础架构中的瓶颈。 It was just our assumptions.这只是我们的假设。 But to counter that assumption we increased capacity a lot that even MS Azure team is saying we are over-provisioning than our usage.但是为了反驳这种假设,我们大幅增加了容量,甚至 MS Azure 团队都说我们过度配置而不是我们的使用量。 So bandwidth has never been an issue.所以带宽从来都不是问题。 It's the limitation of:这是限制:

  1. StackExchangeRedis Nuget package especially when it handles more data bytes. StackExchangeRedis Nuget 包,尤其是当它处理更多数据字节时。

  2. Analysis says we are calling lots of unnecessary endpoints OR data for pages that we don't need for that page.分析表明,我们正在为该页面不需要的页面调用许多不必要的端点或数据。

So thus as my POV, we need to figure out an optimization of how we can use StackExchangeRedis package to handle minimum and only necessary data AND how we can reduce calls of unwanted endpoints on FE, etc.因此,作为我的 POV,我们需要弄清楚如何使用 StackExchangeRedis 包来处理最少且仅必要的数据以及如何减少对 FE 上不需要的端点的调用等的优化。

My teammate had been in touch with the guy who developed StachExchangeRedis.我的队友曾与开发 StachExchangeRedis 的人保持联系。 The developer admitted that with huge data chunks it has limitations.开发人员承认,对于庞大的数据块,它具有局限性。 He also told us that we were not the only ones who are experiencing this issue.他还告诉我们,我们并不是唯一遇到此问题的人。

So after all the discussions, we optimized our calls for GET/SET operations.因此,经过所有讨论,我们优化了对 GET/SET 操作的调用。 It allowed us to handle data bytes in a more precise & effective way for necessary endpoints cutting off all unwanted endpoints and calls.它允许我们以更精确和有效的方式处理数据字节,以便必要的端点切断所有不需要的端点和呼叫。 We also implemented some kind of compression on the backend side.我们还在后端实现了某种压缩。

At the end of this, we have added additional regions which allow us to improve the current situation.最后,我们添加了额外的区域,使我们能够改善当前的情况。

Nevertheless, we are thinking about skipping of usages Redis and going to NoSQL solutions.尽管如此,我们正在考虑跳过使用 Redis 并转向 NoSQL 解决方案。 We will resolve several issues at once - cache issues and real-time data + SQL limitations on big data (So far, this is only at the level of an idea)我们将一次解决几个问题——缓存问题和实时数据+大数据的SQL限制(到目前为止,这只是一个想法层面)

PS Factors impaction Redis performance : PS影响Redis性能的因素

  • In many real-world scenarios, Redis throughput limited by the network well before being limited by the CPU.在许多实际场景中,Redis 的吞吐量在受到 CPU 限制之前会受到网络的限制。 To consolidate several high-throughput Redis instances on a single server, it worth considering putting a 10 Gbit/s NIC or multiple 1 GBit/s NICs with TCP/IP bonding.要将多个高吞吐量 Redis 实例整合到单个服务器上,值得考虑放置一个 10 Gbit/s NIC 或多个具有 TCP/IP 绑定的 1 GBit/s NIC。
  • CPU is another very important factor. CPU是另一个非常重要的因素。 Being single-threaded, Redis favors fast CPUs with large caches and not many cores.由于是单线程的,Redis 更喜欢具有大缓存但内核不多的快速 CPU。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM