We have 3 PostgreSQL databases in GCP's CloudSQL, all three of them are backed-up daily. I need to use Grafana to monitor those back-ups and alert whe ...
We have 3 PostgreSQL databases in GCP's CloudSQL, all three of them are backed-up daily. I need to use Grafana to monitor those back-ups and alert whe ...
I have an endpoint for a REST API that checks for the existence of a (or a list of) requests. It can return 200 OK if there is an order in progress ...
I am trying to use rate() query like comparing last 10 min with the previous 50 min like: (sum by() rate(cmd_get{}[10m]) / (sum by() rate(cmd_get{}[5 ...
I tried to add this to my alertmanager.yml in root level, but I got this error: yaml: unmarshall errors: field time_intervals not found in type config ...
I feel this is a rather basic question, but somehow I'm unable to find a good answer. Recently auditors are complaining about the Role Based Access C ...
I have an endpoint POST /upload that uploads file into my storage. The response time is dependent on the file size (the bigger file, the longer it tak ...
I'm trying to manually OOM Kill pods for testing purposes, does anyone know how I can achieve this? ...
Question 1 ____________ are a treasure trove of information for debugging issues. a) Databases b) Infrastructure as a Code tools c) Networks d) L ...
Assuming aerospike is running, I need some conditions through which check weather aerospike cluster is idle and not being used at all. I tried checki ...
So I'm using puppet3 and I have X.yaml and Y.yaml. X.yaml has profiles::resolv_conf::nameservers: [ '1.1.1.1', '8.8.8.8', '2.2.2.2' ]in it. I want to ...
We are using 1.14.3 version of flink and when we try to run Job manager, we are getting below exception. I tried entering akka.remote.netty.tcp.hostn ...
In case of server side rendering, we know that TTFB is the time it takes between the start of the request and the start of the response. My question i ...
I'm trying to understand how SRE differs from DevOps and I come across this statement from Benjamin Treynor Sloss, Google VP in his SRE book (SRE Book ...
In kubernetes we can set limits and requests for cpu. If the container exceeds the limit, from my understanding it will be throttled. However if the c ...
we deploy resources in our Azure tenant through Jenkins which uses terraform to provision infra resources. and we use service principal for authentica ...
Quote: "SREs at 50% of their time. Their remaining time should be spent using their coding skills on project work." (page 7)" I'm reading this book, ...
I need help with my Jenkinsfile CI file. Code in Jenknsfile looks like this: It was running fine when the Container Image stage was not present af ...
I'm looking to write a prometheus rule to constantly check for message queue length(exim mail relay) which is the total number of files in a directory ...
I would like to create an alert in Prometheus for a REST API, if the API is not available 99% of the time. I am new to prometheus expression. Could yo ...
Here is the current Jmx exporter pattern: Current Output: Which actually works fine. But to improve cardinality we decided not to expose 0.0 val ...