Microservices built with Golang and deployed via Kubernetes provide unmatched scalability, flexibility, and speed. However, the complexity that comes with a distributed system requires robust monitoring to ensure uptime, performance, and visibility across services.
Prometheus, a popular open-source monitoring system, is the go-to choice for collecting metrics from Kubernetes environments. In this post, we explore the most critical Prometheus metrics you should be tracking to effectively monitor your Golang microservices running in Kubernetes — no code required.
1. Application Metrics
Application-level metrics help monitor the health, behavior, and performance of each microservice.
a. Request Count (http_requests_total
)
Track the number of incoming HTTP requests per endpoint, method, or status code. Useful for traffic analysis and alerting on abnormal usage patterns.
b. Request Duration (http_request_duration_seconds
)
Measure how long each request takes to complete. Histogram metrics allow you to track response times and latency trends (e.g., 95th or 99th percentile).
c. Error Rate (http_requests_total{status=~"5.."} / http_requests_total
)
Keep an eye on the ratio of 5xx errors to total requests. A spike in error rate can indicate backend issues or degraded services.
d. Request Size and Response Size
Track payload sizes to understand bandwidth usage and detect anomalies in request/response behavior.
2. Custom Business Logic Metrics
These metrics are specific to your business processes or service responsibilities.
a. Job Execution Metrics
For background jobs, track execution count, duration, and success/failure rates. This helps diagnose slow or failing batch processes.
b. Queue/Message Handling Metrics
If your services consume messages from Kafka, RabbitMQ, or another broker, track:
- Number of messages consumed
- Processing time per message
- Failed message count
c. Resource-Specific Metrics
For services tied to specific entities (e.g., users, orders, payments), track:
- Active user sessions
- Orders processed per minute
- Payment success/failure counts
3. Go Runtime Metrics
The Go runtime exposes several built-in metrics that give insight into the performance and health of your services.
a. Goroutines (go_goroutines
)
Shows the current number of active goroutines. A steadily growing number may indicate a resource leak or blocking operations.
b. GC Pauses (go_gc_duration_seconds
)
Indicates how long garbage collection is taking. Frequent or long GC pauses can degrade performance.
c. Memory Usage
go_memstats_alloc_bytes
– currently allocated heap memorygo_memstats_heap_objects
– number of objects in heapgo_memstats_next_gc_bytes
– memory threshold for the next GC
d. CPU Stats
Though basic CPU usage should be obtained from Kubernetes/node exporters, Go metrics can help understand thread scheduling and blocking behavior.
4. Kubernetes Metrics
These are metrics exposed by Kubernetes components and exporters (like kube-state-metrics, cAdvisor, and node-exporter).
a. Pod Availability and Status
kube_pod_status_phase
– tracks pod phase (Pending, Running, Failed, etc.)kube_pod_container_status_ready
– tells you if containers are ready to serve traffic
b. Resource Requests and Limits
kube_pod_container_resource_requests_cpu_cores
kube_pod_container_resource_limits_memory_bytes
Monitor whether pods are requesting/consuming expected CPU and memory resources.
c. Container Restarts
kube_pod_container_status_restarts_total
Helps identify unstable services with frequent crashes.
d. Node Metrics
Track node-level metrics like:
node_cpu_seconds_total
node_memory_MemAvailable_bytes
node_disk_io_time_seconds_total
Essential for identifying infrastructure bottlenecks.
5. Service Discovery and Endpoint Health
Monitor how services discover each other and whether their endpoints are healthy.
a. Endpoint Availability
kube_endpoint_address_available
– ensures services are routing traffic to live podskube_service_spec_type
– monitors different service types (ClusterIP, NodePort, LoadBalancer)
6. Network and Traffic Metrics
Traffic-related metrics are crucial in a microservices environment where services frequently talk to each other.
a. Inter-Service Call Count and Latency
Track how many calls are made between services, and how long they take. Useful for tracing bottlenecks or failed dependencies.
b. Network I/O
container_network_receive_bytes_total
container_network_transmit_bytes_total
Monitor inbound and outbound traffic per container.
7. Availability and Uptime Metrics
Always include metrics that help ensure SLAs and service availability.
a. Service Uptime
- Track a simple gauge like
app_up
(1 if alive, 0 if down) per service. This is useful for black-box monitoring or readiness checks.
b. Alerting Metrics
Set thresholds and rules for:
- High request latency
- Increased error rates
- Pod/container restarts
- Memory/CPU saturation
These help create actionable alerts via Alertmanager.
8. Database and External Dependency Metrics
If your services depend on databases or third-party APIs, track these as well.
a. Database Connection Metrics
- Open connections
- Query durations
- Failed queries
b. External API Latency and Failure Rates
Monitor the availability and response times of 3rd-party services.
Conclusion
Monitoring Golang microservices in Kubernetes isn’t just about collecting data — it’s about collecting the right data. The metrics listed above provide a solid foundation for observing service performance, maintaining uptime, and ensuring reliable deployments at scale.
By integrating these metrics with Prometheus and visualizing them using Grafana, your team can gain real-time insights and detect issues before they impact users.
Leave a Reply