Must-Have Prometheus Metrics for Monitoring Golang Microservices in Kubernetes



Microservices built with Golang and deployed via Kubernetes provide unmatched scalability, flexibility, and speed. However, the complexity that comes with a distributed system requires robust monitoring to ensure uptime, performance, and visibility across services.

Prometheus, a popular open-source monitoring system, is the go-to choice for collecting metrics from Kubernetes environments. In this post, we explore the most critical Prometheus metrics you should be tracking to effectively monitor your Golang microservices running in Kubernetes — no code required.


1. Application Metrics

Application-level metrics help monitor the health, behavior, and performance of each microservice.

a. Request Count (http_requests_total)

Track the number of incoming HTTP requests per endpoint, method, or status code. Useful for traffic analysis and alerting on abnormal usage patterns.

b. Request Duration (http_request_duration_seconds)

Measure how long each request takes to complete. Histogram metrics allow you to track response times and latency trends (e.g., 95th or 99th percentile).

c. Error Rate (http_requests_total{status=~"5.."} / http_requests_total)

Keep an eye on the ratio of 5xx errors to total requests. A spike in error rate can indicate backend issues or degraded services.

d. Request Size and Response Size

Track payload sizes to understand bandwidth usage and detect anomalies in request/response behavior.


2. Custom Business Logic Metrics

These metrics are specific to your business processes or service responsibilities.

a. Job Execution Metrics

For background jobs, track execution count, duration, and success/failure rates. This helps diagnose slow or failing batch processes.

b. Queue/Message Handling Metrics

If your services consume messages from Kafka, RabbitMQ, or another broker, track:

  • Number of messages consumed
  • Processing time per message
  • Failed message count

c. Resource-Specific Metrics

For services tied to specific entities (e.g., users, orders, payments), track:

  • Active user sessions
  • Orders processed per minute
  • Payment success/failure counts

3. Go Runtime Metrics

The Go runtime exposes several built-in metrics that give insight into the performance and health of your services.

a. Goroutines (go_goroutines)

Shows the current number of active goroutines. A steadily growing number may indicate a resource leak or blocking operations.

b. GC Pauses (go_gc_duration_seconds)

Indicates how long garbage collection is taking. Frequent or long GC pauses can degrade performance.

c. Memory Usage

  • go_memstats_alloc_bytes – currently allocated heap memory
  • go_memstats_heap_objects – number of objects in heap
  • go_memstats_next_gc_bytes – memory threshold for the next GC

d. CPU Stats

Though basic CPU usage should be obtained from Kubernetes/node exporters, Go metrics can help understand thread scheduling and blocking behavior.


4. Kubernetes Metrics

These are metrics exposed by Kubernetes components and exporters (like kube-state-metrics, cAdvisor, and node-exporter).

a. Pod Availability and Status

  • kube_pod_status_phase – tracks pod phase (Pending, Running, Failed, etc.)
  • kube_pod_container_status_ready – tells you if containers are ready to serve traffic

b. Resource Requests and Limits

  • kube_pod_container_resource_requests_cpu_cores
  • kube_pod_container_resource_limits_memory_bytes

Monitor whether pods are requesting/consuming expected CPU and memory resources.

c. Container Restarts

  • kube_pod_container_status_restarts_total
    Helps identify unstable services with frequent crashes.

d. Node Metrics

Track node-level metrics like:

  • node_cpu_seconds_total
  • node_memory_MemAvailable_bytes
  • node_disk_io_time_seconds_total

Essential for identifying infrastructure bottlenecks.


5. Service Discovery and Endpoint Health

Monitor how services discover each other and whether their endpoints are healthy.

a. Endpoint Availability

  • kube_endpoint_address_available – ensures services are routing traffic to live pods
  • kube_service_spec_type – monitors different service types (ClusterIP, NodePort, LoadBalancer)

6. Network and Traffic Metrics

Traffic-related metrics are crucial in a microservices environment where services frequently talk to each other.

a. Inter-Service Call Count and Latency

Track how many calls are made between services, and how long they take. Useful for tracing bottlenecks or failed dependencies.

b. Network I/O

  • container_network_receive_bytes_total
  • container_network_transmit_bytes_total

Monitor inbound and outbound traffic per container.


7. Availability and Uptime Metrics

Always include metrics that help ensure SLAs and service availability.

a. Service Uptime

  • Track a simple gauge like app_up (1 if alive, 0 if down) per service. This is useful for black-box monitoring or readiness checks.

b. Alerting Metrics

Set thresholds and rules for:

  • High request latency
  • Increased error rates
  • Pod/container restarts
  • Memory/CPU saturation

These help create actionable alerts via Alertmanager.


8. Database and External Dependency Metrics

If your services depend on databases or third-party APIs, track these as well.

a. Database Connection Metrics

  • Open connections
  • Query durations
  • Failed queries

b. External API Latency and Failure Rates

Monitor the availability and response times of 3rd-party services.


Conclusion

Monitoring Golang microservices in Kubernetes isn’t just about collecting data — it’s about collecting the right data. The metrics listed above provide a solid foundation for observing service performance, maintaining uptime, and ensuring reliable deployments at scale.

By integrating these metrics with Prometheus and visualizing them using Grafana, your team can gain real-time insights and detect issues before they impact users.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *