I'm currently writing a Prometheus exporter for a telemetry network application.
I've read the doc here Writing Exporters and while I understand the use case for implementing a custom collector to avoid race condition, I'm not sure whether my use case could fit with direct instrumentation.
Basically, the network metrics are streamed via gRPC by the network devices so my exporter just receives them and doesn't have to effectively scrape them.
I've used direct instrumentation with below code:
package metrics
import (
"github.com/lucabrasi83/prom-high-obs/proto/telemetry"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var (
cpu5Sec = promauto.NewGaugeVec(
prometheus.GaugeOpts{
Name: "cisco_iosxe_iosd_cpu_busy_5_sec_percentage",
Help: "The IOSd daemon CPU busy percentage over the last 5 seconds",
},
[]string{"node"},
)
cpu5Sec.WithLabelValues(msg.GetNodeIdStr()).Set(float64(val))
for {
req, err := stream.Recv()
if err == io.EOF {
return nil
}
if err != nil {
logging.PeppaMonLog(
"error",
fmt.Sprintf("Error while reading client %v stream: %v", clientIPSocket, err))
return err
}
data := req.GetData()
msg := &telemetry.Telemetry{}
err = proto.Unmarshal(data, msg)
if err != nil {
log.Fatalln(err)
}
if !logFlag {
logging.PeppaMonLog(
"info",
fmt.Sprintf(
"Telemetry Subscription Request Received - Client %v - Node %v - YANG Model Path %v",
clientIPSocket, msg.GetNodeIdStr(), msg.GetEncodingPath(),
),
)
}
logFlag = true
// Flag to determine whether the Telemetry device streams accepted YANG Node path
yangPathSupported := false
for _, m := range metrics.CiscoMetricRegistrar {
if msg.EncodingPath == m.EncodingPath {
yangPathSupported = true
go m.RecordMetricFunc(msg)
}
}
}
package metrics
import "github.com/lucabrasi83/prom-high-obs/proto/telemetry"
var CiscoMetricRegistrar []CiscoTelemetryMetric
type CiscoTelemetryMetric struct {
EncodingPath string
RecordMetricFunc func(msg *telemetry.Telemetry)
}
func init() {
CiscoMetricRegistrar = append(CiscoMetricRegistrar, CiscoTelemetryMetric{
EncodingPath: CpuYANGEncodingPath,
RecordMetricFunc: ParsePBMsgCpuBusyPercent,
})
}
I'm using Grafana as the frontend and so far haven't seen any particular discrepancy while correlating the Prometheus exposed metrics VS Checking metrics directly on the device.
So I would like to understand whether this is following Prometheus best practices or I should still go through the custom collector route.
Thanks in advance.
You are not following best practices because you are using the global metrics that the article you linked to cautions against. With your current implementation your dashboard will forever show some arbitrary and constant value for the CPU metric after a device disconnects (or, more precisely, until your exporter is restarted).
Instead, the RPC method should maintain a set of local metrics and remove them once the method returns. That way the device's metrics vanish from the scrape output when it disconnects.
Here is one approach to do this. It uses a map that contains currently active metrics. Each map element is the set of metrics for one particular stream (which I understand corresponds to one device). Once the stream ends, that entry is removed.
package main
import (
"sync"
"github.com/prometheus/client_golang/prometheus"
)
// Exporter is a prometheus.Collector implementation.
type Exporter struct {
// We need some way to map gRPC streams to their metrics. Using the stream
// itself as a map key is simple enough, but anything works as long as we
// can remove metrics once the stream ends.
sync.Mutex
Metrics map[StreamServer]*DeviceMetrics
}
type DeviceMetrics struct {
sync.Mutex
CPU prometheus.Metric
}
// Globally defined descriptions are fine.
var cpu5SecDesc = prometheus.NewDesc(
"cisco_iosxe_iosd_cpu_busy_5_sec_percentage",
"The IOSd daemon CPU busy percentage over the last 5 seconds",
[]string{"node"},
nil, // constant labels
)
// Collect implements prometheus.Collector.
func (e *Exporter) Collect(ch chan<- prometheus.Metric) {
// Copy current metrics so we don't lock for very long if ch's consumer is
// slow.
var metrics []prometheus.Metric
e.Lock()
for _, deviceMetrics := range e.Metrics {
deviceMetrics.Lock()
metrics = append(metrics,
deviceMetrics.CPU,
)
deviceMetrics.Unlock()
}
e.Unlock()
for _, m := range metrics {
if m != nil {
ch <- m
}
}
}
// Describe implements prometheus.Collector.
func (e *Exporter) Describe(ch chan<- *prometheus.Desc) {
ch <- cpu5SecDesc
}
// Service is the gRPC service implementation.
type Service struct {
exp *Exporter
}
func (s *Service) RPCMethod(stream StreamServer) (*Response, error) {
deviceMetrics := new(DeviceMetrics)
s.exp.Lock()
s.exp.Metrics[stream] = deviceMetrics
s.exp.Unlock()
defer func() {
// Stop emitting metrics for this stream.
s.exp.Lock()
delete(s.exp.Metrics, stream)
s.exp.Unlock()
}()
for {
req, err := stream.Recv()
// TODO: handle error
var msg *Telemetry = parseRequest(req) // Your existing code that unmarshals the nested message.
var (
metricField *prometheus.Metric
metric prometheus.Metric
)
switch msg.GetEncodingPath() {
case CpuYANGEncodingPath:
metricField = &deviceMetrics.CPU
metric = prometheus.MustNewConstMetric(
cpu5SecDesc,
prometheus.GaugeValue,
ParsePBMsgCpuBusyPercent(msg), // func(*Telemetry) float64
"node", msg.GetNodeIdStr(),
)
default:
continue
}
deviceMetrics.Lock()
*metricField = metric
deviceMetrics.Unlock()
}
return nil, &Response{}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.