简体   繁体   English

ASP.NET .NET 4.5 应用程序在 IIS 中定期崩溃,我无法找出原因

[英]ASP.NET .NET 4.5 Application crashes in IIS periodically and I can't figure out the cause

I have a .net 4.5 ASP.NET WebAPI application.我有一个 .net 4.5 ASP.NET WebAPI 应用程序。 Deployed in IIS using 1 worker on an 8gig VM with 4 CPUs.在具有 4 个 CPU 的 8gig VM 上使用 1 个工作器部署在 IIS 中。

I made changes to it recently (upgraded ServiceStack.Interfaces, ServiceStack.Common, ServiceStack.Redis and a bunch of dependencies) and started noticing that the IIS app pool this app is deployed on recycles about once an hour (give or take a few minutes).我最近对其进行了更改(升级了 ServiceStack.Interfaces、ServiceStack.Common、ServiceStack.Redis 和一堆依赖项)并开始注意到这个应用程序的 IIS 应用程序池大约每小时部署一次(给或需要几分钟) )。

There is nothing in my application logs that show any kind of issues.我的应用程序日志中没有任何内容显示任何类型的问题。 I collect metrics using telegraf and I do NOT see memory metrics increase at all, as far as all the metrics I look at everything looks absolutely normal and then the app pool recycles.我使用 Telegraf 收集指标,但我根本没有看到内存指标增加,就我查看的所有指标而言,一切看起来都绝对正常,然后应用程序池会回收。

I looked at the event viewer and filtered the logs by WAS source and see event with ID 5011. Which basically means the IIS worker crash as I understand.我查看了事件查看器并按 WAS 源过滤日志并查看 ID 为 5011 的事件。据我了解,这基本上意味着 IIS 工作程序崩溃。

So then I used the DebugDiag and ran it on my local box with the app deployed on my box (I can reproduce the issue locally).然后我使用了 DebugDiag 并在我的本地机器上运行它,并将应用程序部署在我的机器上(我可以在本地重现该问题)。 It ran for a while and finally got the same event in the event viewer.它运行了一段时间,终于在事件查看器中获得了相同的事件。 Looked at the crash analysis logs from DebugDiag and all I see if a bunch of exceptions logged but nothing concrete right before the crash.查看来自 DebugDiag 的崩溃分析日志,我看到是否记录了一堆异常,但在崩溃之前没有任何具体内容。

At this point I'm not entirely sure what else I can to figure out what's causing the crash so hoping there are more suggestions on what I can do to get more transparency.在这一点上,我不完全确定我还能做什么来找出导致崩溃的原因,所以希望有更多关于我可以做些什么来获得更多透明度的建议。

What I think is happening is, there is some incompatibility with one of my dependencies and some of the upgraded packages which cause an exception to be thrown which is not handled by anything and crashes the IIS worker.我认为正在发生的事情是,与我的依赖项之一和一些升级的包存在一些不兼容,这导致抛出异常,该异常未被任何东西处理并使 IIS 工作程序崩溃。

My application is working perfectly fine, as far as all API endpoints functions wit no issues, memory is NOT increasing, CPU is fine.我的应用程序运行良好,就所有 API 端点的功能而言都没有问题,内存没有增加,CPU 很好。 So as far as I can tell there are no issues upto the crash.因此,据我所知,崩溃之前没有任何问题。

Wondering if anyone knows any tricks to find whats causing the crash and/or handle it, prevent this exception from escaping and crashing the worker.想知道是否有人知道找到导致崩溃的原因和/或处理它的任何技巧,防止此异常逃逸并使工作程序崩溃。

I was able to narrow down with some confidence that the issue lies somewhere within the ServiceStack.Redis RedisPubSubServer.我能够确信问题出在 ServiceStack.Redis RedisPubSubServer 中的某个地方。 What is the actual issue, I don't know as that would take a lot more time to dig and I've wasted too much time already.实际问题是什么,我不知道,因为这需要更多的时间来挖掘,而且我已经浪费了太多时间。

However, piggybacking on some existing code I had (from before ServiceStack supported sentinel) I created a new implementation of the redis client wrapper for the which I call LazySentinelServiceStackClientWrapper;然而,利用我拥有的一些现有代码(在 ServiceStack 支持哨兵之前),我创建了一个 redis 客户端包装器的新实现,我称之为 LazySentinelServiceStackClientWrapper; instead of using the built-in sentinel manager, it relies on a custom sentinel provider which I created LazySentinelApiSentinelProvider this implementation attempts to interrogate the available sentinel hosts in random order for master and slave nodes and then I construct a pool using the retrieved read/write and readonly hosts and this pool is used to run the redis operations.它不使用内置的哨兵管理器,而是依赖于我创建的自定义哨兵提供程序 LazySentinelApiSentinelProvider 此实现尝试以随机顺序询问可用的哨兵主机以供主节点和从节点使用,然后我使用检索到的读/写构造一个池和只读主机,此池用于运行 redis 操作。 The pool is refreshed whenever an error occurs (after a failover).每当发生错误时(故障转移后)都会刷新池。 Opposed to the builtin sentinel manager that comes with ServiceStack.Redis which instantiates Redis pubsub server and listens for messages from sentinel whenever configuration changes such as fail-overs occur and updates the managed redis connection pool.与 ServiceStack.Redis 附带的内置哨兵管理器相反,它会实例化 Redis pubsub 服务器并在发生故障转移等配置更改时监听来自哨兵的消息,并更新托管的 redis 连接池。

I installed my version of this redis client wrapper into my application has seen no app pool recycle events since (other than the scheduled ones).我将此 redis 客户端包装器的版本安装到我的应用程序中,此后没有看到任何应用程序池回收事件(除了计划的事件)。

在此处输入图片说明

Above is the log of app pool recycle events before I disabled the ServiceStack.Redis sentinel manager.以上是我禁用 ServiceStack.Redis 哨兵管理器之前的应用程序池回收事件日志。

And here's the log of app pool recycle events after installing my new lazy sentinel manager这是安装我的新惰性哨兵管理器后应用程序池回收事件的日志

在此处输入图片说明

The first spike is me recycling the app manually and second one is the scheduled 1am recycle.第一个高峰是我手动回收应用程序,第二个是预定的凌晨 1 点回收。 So clearly the issue is solved.那么问题就解决了。

What is the actual reason why the sentinel manager via redis pub sub server is causing IIS rapid fail protection to fire and recycle the app pool I do not know.哨兵管理器通过 redis pub 子服务器导致 IIS 快速故障保护触发并回收应用程序池的实际原因是什么,我不知道。 Maybe someone with much more redis experience and/or IIS experience can attest to that.也许具有更多 redis 经验和/或 IIS 经验的人可以证明这一点。 Also I did not test this in .net core and only tested for a .net 4.5.1 application deployed in IIS but on many different machines including local development machine and beefy production machines.此外,我没有在 .net core 中测试这个,只测试了在 IIS 中部署的 .net 4.5.1 应用程序,但在许多不同的机器上进行了测试,包括本地开发机器和强大的生产机器。

Finally one last note, that first image which shows all the recycle events, that's on my CI machine which is barely taking any traffic, maybe 1 request every few minutes.最后一个注意事项,显示所有回收事件的第一张图片,在我的 CI 机器上,几乎没有任何流量,可能每隔几分钟就有 1 个请求。 So this means the issue is not some memory leak or some resource exhaustion.所以这意味着问题不是内存泄漏或资源耗尽。 Whatever the issue is, it happens regardless of traffic, CPU load, memory load, it just happens periodically.无论问题是什么,它都会发生,而不管流量、CPU 负载、内存负载如何,它只是定期发生。

Needless to say I will not be using the builtin sentinel manager at least for now.不用说,至少现在我不会使用内置的哨兵管理器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM