简体   繁体   中英

Intermittent ASP.net IIS8.5 uncatchable 500 internal-server-error on Azure cloud service

Lets start with a little background information. I am running a very simple ASP.net MVC Azure cloud service (a web role, Windows Server 2012 R2 with IIS 8.5). This service receives statistics from a flash client, which posts data roughly every 10 seconds (for potentially very large number of clients) and JavaScript. All the service contains is a single controller with two simple actions with a bunch of parameters (representing the individual statistics which are send in various combinations). All the service does is set the CORS and cookie responses (the clients/JavaScript can be embedded on random domains), verify the integrity of the received data and then store it into an Azure table storage account.

In order to ensure our service operates optimally we use New Relic to track service performance, and in order to ensure that our data is accurate (ie we successfully record all received messages) we implemented a custom error handling solution so we can fix any problems/bugs that might arise.

We have load tested our service using jmeter and encountered no problems, but now that we have deployed to a live environment and our service is being used we are starting to encounter occasional 500 internal server errors (approx 5% of requests). The big problem being that our own error handling code is not detecting these errors, however New Relic does report certain requests generating a 500 internal server error (with no further information like a stack trace, sometimes with, sometimes without reported parameters).

Our custom error handling consists of an HTTP module which registers to both the AppDomain.CurrentDomain.UnhandledException and the context.Error events. In theory this should be catching (and then logging) any exceptions which are not already being caught (and logged) inside our own code. Relevant web.config sections are configured in the following manner:

<customErrors mode="On" redirectMode="ResponseRewrite" defaultRedirect="~/500.aspx">
  <error statusCode="404" redirect="~/404.aspx" />
  <error statusCode="500" redirect="~/500.aspx" />
</customErrors>

and

<httpErrors existingResponse="Replace">
  <clear />
  <error statusCode="404" path="404.html" responseMode="File" />
  <error statusCode="500" path="500.html" responseMode="File" />
</httpErrors>
<modules>
  <add type="namespace.UnhandledExceptionModule" name="UnhandledExceptionModule" preCondition="managedHandler" />
</modules>

However, this is not the case. I have tried turning on all kinds of logging but the IIS logs are useless (they only show that a 500 response was returned, but no other useful information). The only useful information I have been able to gather is from the failed request traces, but I have not been able to determine what the actual problem is from that information (googling the error code or exception leads to nothing concrete). A screenshot of the relevant section of a failed trace can be found here:

http://i57.tinypic.com/20acrip.jpg

I also uploaded the complete trace here:

http://pastebin.com/fDt3thvr

Each failed request generates exactly the same log, so the errors we are seeing are consistently being caused by the same problem. However, I am not able to determine what this problem is, let alone find a way of fixing it. Even though I have an error code and message, googling them only returns very old topics on issues that have been fixed 6 years ago.

It is pretty important for our business that these messages can be recorded with a high degree of accuracy, but as it stands now I have no further ideas on how to gain better information on what is happening on these servers. We are also not able to replicate this behavior in a controlled environment.

Also, our error logging itself does work properly. 'Normal' errors are logged as expected and we have also verified the HTTP module actually works.

Edit:

The controller pseudo code is as follows:

[HttpPost]
public ActionResult Method(...)
{
    // Set cookie and CORS reponse, check for early out.
    if(earlyOut)
         return 404;

    // Store received values.
    azuretable.ExecuteAsync(TableOperation.InsertOrMerge(...));

    return 200;
}

Edit2:

I have spend some time analyzing failed request traces and they mostly seem to be generated by users with IE9. I actually managed to reproduce the error 2 times by quickly leaving the page while it is loading, as the problem seems to be caused by aborted Ajax calls (which we make the most of during page load). Why would an aborted call cause a 500 error though instead of being handled neatly?

Do the cookies exceed 4k ? The same thing happened to us on IIS, and the requests sometimes ended up with 500 Internal Server error. The errors were virtually untraceable. I reproduced the issue by simply inflating a cookie over the 4093 bytes limit.

I think that it is because you are not awaiting your async method call, or your are not returning an awaitable response. I had exactly this issue when I forgot to do that.

await azuretable.ExecuteAsync(TableOperation.InsertOrMerge(...))

Then you should be good. I think you'll find that the async call is finishing after your call has completed back to the caller.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM