简体   繁体   English

为什么我的网站遇到随机缓慢的 api 请求?

[英]Why is my website experiencing random slow api requests?

I have a VB.NET/Vue website hosted on an internal IIS 8.5 Windows 2012R2 Server.我有一个 VB.NET/Vue 网站托管在内部 IIS 8.5 Windows 2012R2 服务器上。 Our company has about 30 users using the site at any given time.我们公司在任何给定时间都有大约 30 名用户使用该网站。 The users are experiencing random delays throughout the day and on some days there's no delays (site works great most of the time).用户全天都在经历随机延迟,并且在某些日子没有延迟(网站大部分时间都运行良好)。 What I'm looking for is any suggestions on where to start looking to solve the issue.我正在寻找的是关于从哪里开始寻找解决问题的任何建议。 Here's what I've found so far.这是我到目前为止发现的。

  1. User goes to site and initiates an api request from the UI用户访问站点并从 UI 发起 api 请求
  2. User sees a loading icon for anywhere up to a minute or so while the request returns请求返回时,用户会在任何地方看到加载图标长达一分钟左右
  3. The request eventually reaches the server after some time and executes really fast within milliseconds and returns the response to the user请求最终会在一段时间后到达服务器,并在几毫秒内快速执行并将响应返回给用户
  4. By this time, many users have already refreshed the page making new requests that succeed on page load.到这个时候,许多用户已经刷新了页面,发出新的请求,页面加载成功。 For the users that are patient and wait for the response, it eventually returns the response.对于耐心等待响应的用户,它最终会返回响应。

Here's some screenshots:这是一些屏幕截图:

在此处输入图像描述

在此处输入图像描述

So to sum everything up, there are several users experiencing delays on a daily basis.总而言之,每天都有几个用户遇到延迟。

Some days we don't have any delays, but most days we have several users experiencing multiple delays of several seconds to 30 seconds to 1 minute.有些日子我们没有任何延迟,但大多数日子我们有几个用户经历了几秒到 30 秒到 1 分钟的多次延迟。

I've found all this using LogRocket and NewRelic and what is happening is all these requests are completing within milliseconds, but the request doesn't seem to reach the server for some period of time.我已经使用 LogRocket 和 NewRelic 找到了所有这些,发生的事情是所有这些请求都在几毫秒内完成,但请求似乎在一段时间内没有到达服务器。

I've been monitoring the CPU/Memory/Network on these servers and it all seems fine to me during when these issues occur.我一直在监视这些服务器上的 CPU/内存/网络,当这些问题发生时,我觉得一切都很好。

It seems that the problem lies between the users computer and whatever hardware/software exists before reaching the web server.似乎问题出在用户计算机与到达 web 服务器之前存在的任何硬件/软件之间。

Update here... Found that the problem is occurring on the users computer in all these instances.在此处更新...发现在所有这些情况下,用户计算机上都出现了问题。 Using google chrome's performance api, I was able to track timing info for these requests and found that the problem is in the fetchStart.使用 google chrome 的性能 api,我能够跟踪这些请求的时间信息,发现问题出在 fetchStart 中。 So whatever is happening here is the cause of the issue.因此,这里发生的任何事情都是问题的原因。

Example below:下面的例子:

entryType: resource startTime: 1119531.820000033 duration: 56882.43999995757 initiatorType: xmlhttprequest nextHopProtocol: http/1.1 workerStart: 0 redirectStart: 0 redirectEnd: 0 fetchStart: 1119531.820000033 domainLookupStart: 1176401.0199999902 domainLookupEnd: 1176402.2699999623 connectStart: 1176402.2699999623 connectEnd: 1176404.8350000521 secureConnectionStart: 1176403.6700000288 requestStart: 1176404.8549999716 responseStart: 1176413.5300000198 responseEnd: 1176414.2599999905 transferSize: 15145 encodedBodySize: 14884 decodedBodySize: 14884 serverTiming: [] workerTiming: [] entryType: resource startTime: 1119531.820000033 duration: 56882.43999995757 initiatorType: xmlhttprequest nextHopProtocol: http/1.1 workerStart: 0 redirectStart: 0 redirectEnd: 0 fetchStart: 1119531.820000033 domainLookupStart: 1176401.0199999902 domainLookupEnd: 1176402.2699999623 connectStart: 1176402.2699999623 connectEnd: 1176404.8350000521 secureConnectionStart: 1176403.6700000288 requestStart: 1176404.8549999716 responseStart: 1176413.5300000198 responseEnd: 1176414.2599999905 transferSize: 15145 encodedBodySize: 14884 decodedBodySize: 14884 serverTiming: [] workerTiming: []

fetchStart is at 1119531.820000033, then requestStart is at 1176404.8549999716 so the problem is something between fetchStart and requestStart. fetchStart 位于 1119531.820000033,然后 requestStart 位于 1176404.8549999716,所以问题出在 fetchStart 和 requestStart 之间。 Still looking into what is causing this.仍在调查是什么原因造成的。

In 20202, we are experiencing something very similar with a small fraction of our customers.在 20202 年,我们的一小部分客户正在经历非常相似的事情。 There is a significant gap between the timing api requestStart and the startTime .时序 api requestStartstartTime之间存在显着差距。 This gap can be up to 8 minutes -- I admire the patience of customers waiting that long.这个间隔可能长达8分钟——我很佩服顾客等待那么久的耐心。 The wait periods are also close to multiples of a minute.等待时间也接近一分钟的倍数。

In our case, it appears that there is a (transparent?) proxy between those browsers and our server infrastructure which appears to be triggering the problem.在我们的例子中,这些浏览器和我们的服务器基础设施之间似乎有一个(透明的?)代理,这似乎触发了这个问题。 In particular, it forces a downgrade of HTTP/2 to HTTP/1.1.特别是,它强制将 HTTP/2 降级到 HTTP/1.1。 Whitelisting our website in that proxy does solve the problem.在该代理中将我们的网站列入白名单确实可以解决问题。 This isn't a very satisfactory solution, but it does make the customer happier!这不是一个非常令人满意的解决方案,但它确实让客户更快乐!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM