简体   繁体   中英

IIS 8.5 throwing 400 Bad request in Production when calling REST WS using Post method

I've a strange issue. Let me explain in details below step by step:

  1. I've a vendor developed REST WS (Made using WCF) for synchronizing data with MS CRM.

  2. I've developed a windows service, which fetches batches of data to be synchronized from a database and then pass it to this web services using Post method as JSON objects. The windows service is deployed on one of the node.

  3. The issue I am facing have never occurred in Dev, QA, UAT or staging environment. It's unique to Production environment only.

  4. In production the application works for sometime and then starts throwing 400 Bad request error. Then until we restarts the site or reset the App Pool Identity the IIS keeps throwing 400 Bad request errors. When we restarts the site or App Pool then the same requests which were getting failed starts getting successful responses. It works for sometime like this and than again 400 starts happening.

  5. The environment where web services are hosted is a Win Server 2012, 2 node load balanced environment. WS is deployed on port 8080 on boht nodes and configured to run under .Net 4.0.

  6. I received the following error in my Windows Service log which is the client for these WS.

System.Net.WebException: The remote server returned an error: (400) Bad Request. at SspToCrmSynchronizationService.Helpers.CrmWrapperWsHelper.CallService(String data, String url, String method, String userName, String password, String contentType) in CrmWrapperWsHelper.cs:line 79 at SspToCrmSynchronizationService.Helpers.CrmWrapperWsHelper.CallDocumentCreateService(String data) in CrmWrapperWsHelper.cs:line 20 at SspToCrmSynchronizationService.Process.CommonOperations.GenerateJsonAndInvokeDocCreateWS(Int64 appRefNo, Application app) in CommonOperations.cs:line 52 at SspToCrmSynchronizationService.Process.SequentialProcess.Process(List`1 appList, DatabaseHelper dbHelperForChildTask, CancellationToken ct) in SequentialProcess.cs:line 88

  1. First we've checked the IIS Logs and found that IIS is returning the 400 error in just few 100 MS. We suspect that it's not reaching to WS application since application is not logging any thing at all, despite that Logging the request is the first thing vendor is doing in the WS code.

  2. Secondly we used Fiddler to capture the request and response and get the following:

     HTTP/1.1 400 Bad Request Cache-Control: private Content-Length: 1647 Content-Type: text/html Server: Microsoft-IIS/8.5 X-ASpNet-Version: 4.0.30319 X-Powered-By: ASP.Net Date: Tue, 17 Oct 2017 07:14:26 GMT 

提琴手日志

  1. Than we checked the IIS Httperr.log. In the log we found the following for some of the request and not for every failed requests. It seems nothing is there.

2017-07-07 03:32:45 10.102.2.52 63726 10.102.2.52 8080 - - - - - Timer_ConnectionIdle -
2017-07-08 22:46:55 10.102.2.52 50916 10.102.2.52 8080 - - - - - Timer_ConnectionIdle - 2017-07-08 22:55:09 10.102.2.52 51004 10.102.2.52 8080 - - - - - Timer_ConnectionIdle -

  1. Than we configured the Failed Traced Log in IIS for 400 and got one warning in the trace log when this 400 Error is thrown. I've erased some of the data from the image due to NDA and security reasons.

IIS失败的400错误请求的跟踪日志

Basically the Warning details are as follows:

 124. MODULE_SET_RESPONSE_ERROR_STATUS ModuleName="ManagedPipelineHandler", Notification="EXECUTE_REQUEST_HANDLER", HttpStatus="400", HttpReason="Bad Request", HttpSubStatus="0", ErrorCode="The operation completed successfully. (0x0)", ConfigExceptionInfo="" 
  1. After that I've compared one error case and one success case. Below is the image. What I noticed is that in case of failure General_Read_Entity_Start and General_Read_Entity_End didn't occurred at all.

错误和成功案例

The maximum I can understand is that somehow IIS is not able to parse JSON String to some Entity and the AppPool gets crashes then starts throwing 400 errors until a AppPool or IIS Reset is not done. I've no idea what is causing this (Root Cause) and how to resolve this and why it's working initially and not working after some time. Any help would be highly appreciated.

[Edits]

  1. Resource consumption on the server is less then 10%.
  2. For successful cases the average response time of WS are 5 secs, while for Error cases it returns within 100 Milli Seconds.
  3. We did the Stress test for the service in test for about 100+ request and all went ok.

Is the time of "stop working" the same after restarts ? Is it changing linear to the assumed traffic the service went through ? Have you stress/spike tested the service ? Have you monitored the resources of the hosting server ?

IF it only appears to happen at Prod , which shall not be different from the Test servers, expect that the Prod is used by unknown quantity of sources. First I would make sure that resources are nothing to do with this. ( Something wild: if not violating user rights mirror the in requests of prod to one of your test servers with similar capabilities and see what happens )

We need to understand how it works, there were quite a few conditions:

  1. We were passing a datetime value in JSON. At the WS end the DateTime value was throwing some parsing errors when The WS container in our case (IIS and WCF) trying to pass the DateTime field to the application was failing in doing the conversion. I believe this could be due to locale. We fixed it by changing datetime to string in the WS accepted JSON.

  2. The second issue was that our vendor used WCF as a technology to create RestAPI. One behavior of WCF is that if a request from a client causes a fatal exception to occour in the WS container the IIS will register this client on a block list and will not forward the request from the same client to the application until an IIS restart is done. The IIS will keep returning us Bad Request status message.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM