简体繁体 English

生产中.NET应用程序的持续性能监控？

[英]Continuous Performance Monitoring of .NET Applications in Production?

原文 2010-08-09 15:12:28 6 2 .net/ performance/ monitoring/ production/ continuous

Given a relatively typical .NET 4 system in an SOA environment (ie Windows Server 2008 R2, RESTful Web Services on IIS 7, Windows Services for NServiceBus messaging, SQL Server 2008 R2, etc) what are the best practices or de facto solutions (without enterprise price tag) for performing 24x7 performance monitoring in production? 鉴于SOA环境中相对典型的.NET 4系统（即Windows Server 2008 R2，IIS 7上的RESTful Web服务，NServiceBus消息传递的Windows服务，SQL Server 2008 R2等），最佳实践或事实上的解决方案是什么（没有企业价格标签）用于在生产中执行24x7性能监控？

Not necessarily how much CPU/Memory/Disk IO it consumes but rather for example how many createAccount() calls per minute were made, what is the average time generateResponse() method takes and detect unusual delta spikes between for example generateResponseStarted and generateResponseComplete (method was invoked (which in turn can call 3rd party) and response is ready to be returned respectively). 它不一定消耗多少CPU /内存/磁盘IO，而是例如每分钟创建多少createAccount（）调用，generateResponse（）方法获取和检测异常增量峰值的平均时间是多少，例如generateResponseStarted和generateResponseComplete（方法）被调用（反过来可以调用第三方）并且响应准备好分别返回）。

After some googling it seems options are for low level profilers (like dotTrace) and implementing Performance Counters and consuming those with PerfMon or some other OpManager type product. 经过一些谷歌搜索后，似乎选择适用于低级分析器（如dotTrace）和实现性能计数器，并使用PerfMon或其他一些OpManager类型的产品。

What would you recommend? 你会推荐什么？ Would implementing performance counters for a live application significantly degrade performance on production system? 为实时应用程序实现性能计数器会显着降低生产系统的性能吗？ If not, are there any good libraries that streamline the implementation in .NET? 如果没有，是否有任何好的库可以简化.NET中的实现？ If yes, how do people monitor their applications' performance other than memory-disk-cpu? 如果是，除了memory-disk-cpu之外，人们如何监控应用程序的性能？

@Ryan Hayes @Ryan Hayes

Thanks, I'm looking for a way to see an unusual slowing down or spikes on production systems. 谢谢，我正在寻找一种方法来查看生产系统上的异常减速或峰值。 For example all was good during stress testing but for some reason 3rd party we rely on is having some problems or DB is slowing down due to thread locking, or SAN is giving way, or any other unexpected scenarios. 例如，在压力测试期间一切都很好，但出于某些原因，我们依赖的第三方存在一些问题，或者由于线程锁定或者SAN正在让位或任何其他意外情况导致数据库速度变慢。 Low level profiling is too much of an overhead while turning counters on only when there is a problem is too late at that point. 低级别分析是一个过多的开销，而只有在出现问题时才开启计数器，此时为时已晚。 Plus we'll be missing historical data to compare it to (I would need some sort of alert system for when delta is outside of an acceptable threshold). 另外，我们将丢失历史数据以与之进行比较（当delta超出可接受的阈值时，我需要某种警报系统）。 I'm wondering how people monitor performance of their production systems and in their experience what would be the best approach for non memory/cpu/server related kind of monitoring. 我想知道人们如何监控他们的生产系统的性能和他们的经验，这是非内存/ CPU /服务器相关监控的最佳方法。

2 个解决方案

You can try AlertGrid . 你可以尝试AlertGrid 。 Looks like this can be a solution for your problems. 看起来这可以解决您的问题。

You can send various parameters to AlertGrid from your application (new account name, time of executing some important piece of logic and so on). 您可以从应用程序向AlertGrid发送各种参数（新帐户名，执行某些重要逻辑的时间等）。 AlertGrid service can do couple of things with your data. AlertGrid服务可以对您的数据执行一些操作。 First of all it can process some notification rules built with parameters you've sent (like if time of doing something important > X seconds -> send sms to person in charge). 首先，它可以处理一些使用您发送的参数构建的通知规则（例如，如果执行某些重要事项的时间> X秒 - >将短信发送给负责人）。

In a two weeks AlertGrid is going to have a bunch of new features. 在两周内，AlertGrid将拥有一系列新功能。 Looks like the most important for you will be the possiblity to plot parameters received from your system. 看起来最重要的是绘制从系统接收的参数的可能性。

Please note that AlertGrid cannot detect parameters from your systems - you need to send them instead. 请注意，AlertGrid无法检测系统中的参数 - 您需要发送它们。 This might looks like an additional piece of work, but we think it is comparable to time required for installing and configuring some specialized tools. 这可能看起来像是一项额外的工作，但我们认为它与安装和配置某些专用工具所需的时间相当。 On the other hand thanks to this approach AlertGrid overcomes some limitations (it can be integrated with anything that can send http requests). 另一方面，由于这种方法，AlertGrid克服了一些限制（它可以与任何可以发送http请求的东西集成）。

I believe it will be much easier to understand when you create account in AlertGrid and pass its interactive tutorial. 我相信当您在AlertGrid中创建帐户并通过其交互式教程时，将更容易理解。

As you might have noticed I'm a developer in AlertGrid team:) 您可能已经注意到我是AlertGrid团队的开发人员:)

Disclaimer: At the momment of writing we know that prices of AlertGrid are going to be reduced in a near future, so don't look at them right now, you can contact our support line for more information on pricing. 免责声明：在撰写本文时，我们知道AlertGrid的价格将在不久的将来降低，所以现在不要看它们，您可以联系我们的支持热线了解更多定价信息。 Free account is available and should be enough for the begining. 免费帐户可用，应该足够开始。

The question here is really what are you trying to learn from the performance monitoring? 这里的问题是你真的想从性能监控中学到什么？

Do you want to make your code faster? 你想让你的代码更快吗？ Then I would suggest using the profiling tools on a test environment to find out where you can improve your code. 然后我建议在测试环境中使用分析工具来找出可以改进代码的位置。
Do you want to find out the maximum beating your system can handle? 您想了解系统可以处理的最大殴打吗？ Then I would suggest performing load testing on a test environment. 然后我建议在测试环境中执行负载测试。 If you know exactly how hard you can push your system without destroying it, then you won't need to put monitoring into production. 如果您确切地知道在不破坏系统的情况下推动系统的难度，那么您将无需将监控投入生产。

For production, you probably want to maximize performance. 对于生产，您可能希望最大限度地提高性能。 To do this, it's common to push a test environment hard and get solid metrics so that you don't need to put performance monitors in place in production. 要做到这一点，通常很难推动测试环境并获得可靠的指标，这样您就不需要在生产中安装性能监视器。 For production, you just want to be able to know when you hit that peak and then degrade gracefully or whatever you see fit. 对于生产，你只是想知道你什么时候达到那个高峰，然后优雅地降级或者你认为合适的任何东西。 Generally, good logging is the best way to monitor system (besides hardware) performance and keep a record of exceptional performance quirks. 通常，良好的日志记录是监视系统（除硬件之外）性能的最佳方式，并记录特殊性能怪癖。

Every system is different though, and your mileage may vary. 虽然每个系统都不同，但您的里程可能会有所不同。 Take this as a suggestion rather than the way EVERYONE does it, because there are always exceptional cases where you may have to have profiling running in production. 把它当作一个建议，而不是每个人都这样做的方式，因为总有一些例外情况你可能需要在生产中运行分析。