简体   繁体   English

带有未处理的服务总线故障消息的Azure工作人员角色回收

[英]Azure worker role recycling with unhandled Service Bus fault message

I have been running an Azure worker role deployment that uses the Microsoft.ServiceBus 2.2 library to respond to jobs posted from other worker roles and web roles. 我一直在运行一个使用Microsoft.ServiceBus 2.2库的Azure辅助角色部署,以响应从其他辅助角色和Web角色发布的作业。 Recently (suspiciously around the time of the OS update discussed here ), the instances of the cluster started constantly recycling, rebooting, running for a short period of time, and then recycling again. 最近(在此处讨论的操作系统更新时,这很可疑),群集实例开始不断循环,重新引导,运行一小段时间,然后再次循环。

I can confirm that the role instances make it all the way through the OnStart() method of my RoleEntryPoint from the trace messages I have in my diagnostics. 我可以通过诊断中具有的跟踪消息,通过RoleEntryPoint的OnStart()方法来确认角色实例是否完全实现了。 Occasionally, the Instances pane of the Azure Management Portal would mention that a recycling role had experienced an "unhandled exception," but would not give more detail. 有时,Azure管理门户的“实例”窗格会提到回收角色遇到了“未处理的异常”,但没有提供更多详细信息。 After logging in with remote desktop to one of the instances, the two clues I have are: 使用远程桌面登录到实例之一后,我有两个提示:

  1. Performance counters indicate that \\Processor(_Total)\\% Processor Time is hovering at 100%, periodically dropping to the mid-80s coinciding with drops in \\TCPv4\\Connections Established. 性能计数器表明\\ Processor(_Total)\\%Processor Time徘徊在100%,周期性地下降到80年代中期,同时\\ TCPv4 \\ Connections Builded下降。 Some drops in \\TCPv4\\Connections Established do not correlate with drops in \\Processor(_Total)\\% Processor Time. \\ TCPv4 \\已建立的连接中的某些下降与\\ Processor(_Total)\\%Processor Time中的下降没有关联。
  2. I was able to find, in the Local Server Events in the Server Manager of one of the instances, the following message: 我能够在其中一个实例的服务器管理器的本地服务器事件中找到以下消息:

    Application: WaWorkerHost.exe Framework Version: v4.0.30319 Description: The process was terminated due to an unhandled exception. 应用程序:WaWorkerHost.exe框架版本:v4.0.30319说明:由于未处理的异常,进程已终止。 Exception Info: Microsoft.ServiceBus.Common.CallbackException Stack: at Microsoft.ServiceBus.Common.Fx+IOCompletionThunk.UnhandledExceptionFrame(UInt32, UInt32, System.Threading.NativeOverlapped*) at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32, UInt32, System.Threading.NativeOverlapped*) 异常信息:Microsoft.ServiceBus.Common.CallbackException堆栈:Microsoft.ServiceBus.Common.Fx + IOCompletionThunk.UnhandledExceptionFrame(UInt32,UInt32,System.Threading.NativeOverlapped *)在System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32,UInt32,System .Threading.NativeOverlapped *)

There have been no permissions configuration changes associated with the service bus during this time, and this message occurs despite us not having updated any of our VMs. 在此期间,没有与服务总线关联的权限配置更改,尽管我们没有更新任何VM,但仍会出现此消息。 Nonetheless, it also appears that our service is still functioning => jobs are being processed and removed from the Service Bus Queues they are listening to. 但是,似乎我们的服务仍在运行=>正在处理作业,并将其从它们正在侦听的服务总线队列中删除。

Most Googling on these issues turns up suggestions that this is somehow related to IntelliTrace, however, these VMs do not have IntelliTrace enabled on them. 在这些问题上,大多数Google搜索都提出了与IntelliTrace相关的建议,但是,这些VM并未启用IntelliTrace。

Does anyone have any ideas on what is going on here? 有人对这里发生的事情有任何想法吗?

The service bus exceptions turned out to be a red herring from the perspective of the crashing - a namespace conflict in one of the data contracts being sent between two different VM roles that were published at different times. 从崩溃的角度来看,服务总线异常被证明是一个红色鲱鱼-在不同时间发布的两个不同VM角色之间发送的数据合同之一中的名称空间冲突。 Adding additional tracing to exceptions thrown during one of the receive retries revealed it. 向在一次接收重试期间引发的异常添加其他跟踪可以发现它。 Still a mystery as to why it's working at all, and the role recycling has not ceased, just the service bus exception. 关于它为什么仍在工作以及角色回收还没有停止的原因仍然是个谜,只是服务总线例外。

I had the similar issue. 我有类似的问题。 The main reason is that it could not resolve the Service Bus dll version issues make sure the version you are redirecting in AppSettings and the version you actually added reference to are same. 主要原因是它无法解决Service Bus dll版本问题,请确保您在AppSettings中重定向的版本与您实际添加引用的版本相同。 It may occur with any dll mismatches not only with service bus dll... 不仅服务总线dll可能与任何dll不匹配都可能发生...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM