简体   繁体   中英

.NET CORE API application timeout exception hosted in Microsoft Azure (Redis cache, SQL, .NET Core)

I have the following infrastructure: .NET Core 3.1 API, which is hosted in VNet. Inside of VNet we have 8 servers with load balancer + SQL Server + Redis Cache.

We are running the API Load test 1200 operation per second on login operation (which is not a lightweight operation). At this moment the load on all servers is 5-10%. But the problem is we're getting API timeout and Redis timeout issues.

It seems like something is blocking our threads

This is from my Startup.cs (we're trying to play with the value, but no success):

  var threadCount = 2000; 
  ThreadPool.GetMaxThreads(out _, out var completionThreads); 
  ThreadPool.SetMinThreads(threadCount, completionThreads);

This is from *.csproj file:

 <PropertyGroup>
<ThreadPoolMinThreads>315</ThreadPoolMinThreads>

Update1-> Redis issue information is added

Redis error: StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 10008ms elapsed, timeout is 10000ms), command=GET, next: SET key_digievents____freeevent_4072, inst: 0, qu: 0, qs: 684, aw: False, rs: ReadAsync, ws: Idle, in: 2197285, in-pipe: 0, out-pipe: 0, serverEndpoint: 10.0.0.34:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: akssocial27apiapp-xkkb4, IOCP: (Busy=0,Free=1000,Min=1000,Max=1000), WORKER: (Busy=430,Free=32337,Min=315,Max=32767), v: 2.1.58.34321 StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 10008ms elapsed, timeout is 10000ms), command=GET, next: SET key_digievents____freeevent_4072, inst: 0, qu: 0, qs: 684, aw: False, rs: ReadAsync, ws: Idle, in: 2197285, in-pipe: 0, out-pipe: 0, serverEndpoint: 10.0.0.34:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: akssocial27apiapp-xkkb4, IOCP: (Busy=0,Free=1000,Min=1000,Max=1000), WORKER: (Busy=430,Free=32337,Min=315,M ax=32767), v: 2.1.58.34321 at Datadog.Trace.ClrProfiler.Integrations.StackExchange.Redis.ConnectionMultiplexer.ExecuteAsyncImplInternal[T](Object multiplexer, Object message, Object processor, Object state, Object server, Func`6 originalMethod)

I will be glad for any advice. Thanks in advance.

Many people meet TimeoutException when they upgrade to 2.x

https://github.com/StackExchange/StackExchange.Redis/issues/1226

this solution might help you: Are you seeing a high number of busyio or busyworker threads in the timeout exception?

At the end of the post, it says:

In .Net Core, add Environment Variable COMPlus_ThreadPool_ForceMinWorkerThreads to overwrite default MinThreads setting, according to Environment/Registry Configuration Knobs - You can also use the same ThreadPool.SetMinThreads() Method as described above.

Below is the text I enclose my notes on the question I asked. I hope this helps someone and saves a lot of time.

First of all, we had no exceptions/errors/reports that bandwidth is a bottleneck in Azure infrastructure. It was just our assumptions. But to counter that assumption we increased capacity a lot that even MS Azure team is saying we are over-provisioning than our usage. So bandwidth has never been an issue. It's the limitation of:

  1. StackExchangeRedis Nuget package especially when it handles more data bytes.

  2. Analysis says we are calling lots of unnecessary endpoints OR data for pages that we don't need for that page.

So thus as my POV, we need to figure out an optimization of how we can use StackExchangeRedis package to handle minimum and only necessary data AND how we can reduce calls of unwanted endpoints on FE, etc.

My teammate had been in touch with the guy who developed StachExchangeRedis. The developer admitted that with huge data chunks it has limitations. He also told us that we were not the only ones who are experiencing this issue.

So after all the discussions, we optimized our calls for GET/SET operations. It allowed us to handle data bytes in a more precise & effective way for necessary endpoints cutting off all unwanted endpoints and calls. We also implemented some kind of compression on the backend side.

At the end of this, we have added additional regions which allow us to improve the current situation.

Nevertheless, we are thinking about skipping of usages Redis and going to NoSQL solutions. We will resolve several issues at once - cache issues and real-time data + SQL limitations on big data (So far, this is only at the level of an idea)

PS Factors impaction Redis performance :

  • In many real-world scenarios, Redis throughput limited by the network well before being limited by the CPU. To consolidate several high-throughput Redis instances on a single server, it worth considering putting a 10 Gbit/s NIC or multiple 1 GBit/s NICs with TCP/IP bonding.
  • CPU is another very important factor. Being single-threaded, Redis favors fast CPUs with large caches and not many cores.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM