Must-Have Prometheus Metrics for Monitoring Golang Microservices in Kubernetes

Microservices built with Golang and deployed via Kubernetes provide unmatched scalability, flexibility, and speed. However, the complexity that comes with a distributed system requires robust monitoring to ensure uptime, performance, and visibility across services.

Prometheus, a popular open-source monitoring system, is the go-to choice for collecting metrics from Kubernetes environments. In this post, we explore the most critical Prometheus metrics you should be tracking to effectively monitor your Golang microservices running in Kubernetes — no code required.

1. Application Metrics

Application-level metrics help monitor the health, behavior, and performance of each microservice.

a. Request Count (`http_requests_total`)

Track the number of incoming HTTP requests per endpoint, method, or status code. Useful for traffic analysis and alerting on abnormal usage patterns.

b. Request Duration (`http_request_duration_seconds`)

Measure how long each request takes to complete. Histogram metrics allow you to track response times and latency trends (e.g., 95th or 99th percentile).

c. Error Rate (`http_requests_total{status=~"5.."} / http_requests_total`)

Keep an eye on the ratio of 5xx errors to total requests. A spike in error rate can indicate backend issues or degraded services.

d. Request Size and Response Size

Track payload sizes to understand bandwidth usage and detect anomalies in request/response behavior.

2. Custom Business Logic Metrics

These metrics are specific to your business processes or service responsibilities.

a. Job Execution Metrics

For background jobs, track execution count, duration, and success/failure rates. This helps diagnose slow or failing batch processes.

b. Queue/Message Handling Metrics

If your services consume messages from Kafka, RabbitMQ, or another broker, track:

Number of messages consumed
Processing time per message
Failed message count

c. Resource-Specific Metrics

For services tied to specific entities (e.g., users, orders, payments), track:

Active user sessions
Orders processed per minute
Payment success/failure counts

3. Go Runtime Metrics

The Go runtime exposes several built-in metrics that give insight into the performance and health of your services.

a. Goroutines (`go_goroutines`)

Shows the current number of active goroutines. A steadily growing number may indicate a resource leak or blocking operations.

b. GC Pauses (`go_gc_duration_seconds`)

Indicates how long garbage collection is taking. Frequent or long GC pauses can degrade performance.

c. Memory Usage

go_memstats_alloc_bytes – currently allocated heap memory
go_memstats_heap_objects – number of objects in heap
go_memstats_next_gc_bytes – memory threshold for the next GC

d. CPU Stats

Though basic CPU usage should be obtained from Kubernetes/node exporters, Go metrics can help understand thread scheduling and blocking behavior.

4. Kubernetes Metrics

These are metrics exposed by Kubernetes components and exporters (like kube-state-metrics, cAdvisor, and node-exporter).

a. Pod Availability and Status

kube_pod_status_phase – tracks pod phase (Pending, Running, Failed, etc.)
kube_pod_container_status_ready – tells you if containers are ready to serve traffic

b. Resource Requests and Limits

kube_pod_container_resource_requests_cpu_cores
kube_pod_container_resource_limits_memory_bytes

Monitor whether pods are requesting/consuming expected CPU and memory resources.

c. Container Restarts

kube_pod_container_status_restarts_total
Helps identify unstable services with frequent crashes.

d. Node Metrics

Track node-level metrics like:

node_cpu_seconds_total
node_memory_MemAvailable_bytes
node_disk_io_time_seconds_total

Essential for identifying infrastructure bottlenecks.

5. Service Discovery and Endpoint Health

Monitor how services discover each other and whether their endpoints are healthy.

a. Endpoint Availability

kube_endpoint_address_available – ensures services are routing traffic to live pods
kube_service_spec_type – monitors different service types (ClusterIP, NodePort, LoadBalancer)

6. Network and Traffic Metrics

Traffic-related metrics are crucial in a microservices environment where services frequently talk to each other.

a. Inter-Service Call Count and Latency

Track how many calls are made between services, and how long they take. Useful for tracing bottlenecks or failed dependencies.

b. Network I/O

container_network_receive_bytes_total
container_network_transmit_bytes_total

Monitor inbound and outbound traffic per container.

7. Availability and Uptime Metrics

Always include metrics that help ensure SLAs and service availability.

a. Service Uptime

Track a simple gauge like app_up (1 if alive, 0 if down) per service. This is useful for black-box monitoring or readiness checks.

b. Alerting Metrics

Set thresholds and rules for:

High request latency
Increased error rates
Pod/container restarts
Memory/CPU saturation

These help create actionable alerts via Alertmanager.

8. Database and External Dependency Metrics

If your services depend on databases or third-party APIs, track these as well.

a. Database Connection Metrics

Open connections
Query durations
Failed queries

b. External API Latency and Failure Rates

Monitor the availability and response times of 3rd-party services.

Conclusion

Monitoring Golang microservices in Kubernetes isn’t just about collecting data — it’s about collecting the right data. The metrics listed above provide a solid foundation for observing service performance, maintaining uptime, and ensuring reliable deployments at scale.

By integrating these metrics with Prometheus and visualizing them using Grafana, your team can gain real-time insights and detect issues before they impact users.

Must-Have Prometheus Metrics for Monitoring Golang Microservices in Kubernetes

1. Application Metrics

a. Request Count (`http_requests_total`)

b. Request Duration (`http_request_duration_seconds`)

c. Error Rate (`http_requests_total{status=~"5.."} / http_requests_total`)

d. Request Size and Response Size

2. Custom Business Logic Metrics

a. Job Execution Metrics

b. Queue/Message Handling Metrics

c. Resource-Specific Metrics

3. Go Runtime Metrics

a. Goroutines (`go_goroutines`)

b. GC Pauses (`go_gc_duration_seconds`)

c. Memory Usage

d. CPU Stats

4. Kubernetes Metrics

a. Pod Availability and Status

b. Resource Requests and Limits

c. Container Restarts

d. Node Metrics

5. Service Discovery and Endpoint Health

a. Endpoint Availability

6. Network and Traffic Metrics

a. Inter-Service Call Count and Latency

b. Network I/O

7. Availability and Uptime Metrics

a. Service Uptime

b. Alerting Metrics

8. Database and External Dependency Metrics

a. Database Connection Metrics

b. External API Latency and Failure Rates

Conclusion

Comments

Leave a Reply Cancel reply

More posts

Python Data Model: A Dive into Python’s Object-Oriented Magic

Gin Index in PostgreSQL: A guide

SQLAlchemy Database Locks Using FastAPI: A Simple Guide

Async SQLAlchemy Engine in FastAPI – The Guide

Must-Have Prometheus Metrics for Monitoring Golang Microservices in Kubernetes

1. Application Metrics

a. Request Count (http_requests_total)

b. Request Duration (http_request_duration_seconds)

c. Error Rate (http_requests_total{status=~"5.."} / http_requests_total)

d. Request Size and Response Size

2. Custom Business Logic Metrics

a. Job Execution Metrics

b. Queue/Message Handling Metrics

c. Resource-Specific Metrics

3. Go Runtime Metrics

a. Goroutines (go_goroutines)

b. GC Pauses (go_gc_duration_seconds)

c. Memory Usage

d. CPU Stats

4. Kubernetes Metrics

a. Pod Availability and Status

b. Resource Requests and Limits

c. Container Restarts

d. Node Metrics

5. Service Discovery and Endpoint Health

a. Endpoint Availability

6. Network and Traffic Metrics

a. Inter-Service Call Count and Latency

b. Network I/O

7. Availability and Uptime Metrics

a. Service Uptime

b. Alerting Metrics

8. Database and External Dependency Metrics

a. Database Connection Metrics

b. External API Latency and Failure Rates

Conclusion

Comments

Leave a Reply Cancel reply

More posts

Python Data Model: A Dive into Python’s Object-Oriented Magic

Gin Index in PostgreSQL: A guide

SQLAlchemy Database Locks Using FastAPI: A Simple Guide

Async SQLAlchemy Engine in FastAPI – The Guide

a. Request Count (`http_requests_total`)

b. Request Duration (`http_request_duration_seconds`)

c. Error Rate (`http_requests_total{status=~"5.."} / http_requests_total`)

a. Goroutines (`go_goroutines`)

b. GC Pauses (`go_gc_duration_seconds`)