Healthcheck for Go Applications

A production-ready health check library for Go applications that enables proper monitoring and graceful degradation in
modern cloud environments,
especially Kubernetes.
Why Health Checks Matter
Health checks are critical for building resilient, self-healing applications in distributed systems. They provide:
- Automatic Recovery: In Kubernetes, failed health checks trigger automatic pod restarts, ensuring your application
recovers from transient failures without manual intervention.
- Load Balancer Integration: Health checks prevent traffic from being routed to unhealthy instances, maintaining
service quality even during partial outages.
- Graceful Degradation: By monitoring dependencies (databases, caches, external APIs), your application can degrade
gracefully when non-critical services fail.
- Operational Visibility: Health endpoints provide instant insight into system state, making debugging and incident
response faster.
- Zero-Downtime Deployments: Readiness checks ensure new deployments only receive traffic when fully initialized.
Features
- Multiple Check Types: Basic (sync), Manual, and Background (async) checks for different use cases
- Kubernetes Native: Built-in
/live and /ready endpoints following k8s conventions
- JSON Status Reports: Detailed health status with history for debugging
- Metrics Integration: Callbacks for Prometheus or other monitoring systems
- Thread-Safe: Concurrent-safe operations with proper synchronization
- Graceful Shutdown: Proper cleanup of background checks and shutdown signaling
- Check History: Last 5 states stored for each check for debugging
Installation
go get -u github.com/kazhuravlev/healthcheck
Quick Start
package main
import (
"context"
"errors"
"math/rand"
"time"
"github.com/kazhuravlev/healthcheck"
)
func main() {
ctx := context.TODO()
// 1. Create healthcheck instance
hc, _ := healthcheck.New()
// 2. Register a simple check
hc.Register(ctx, healthcheck.NewBasic("redis", time.Second, func(ctx context.Context) error {
if rand.Float64() > 0.5 {
return errors.New("service is not available")
}
return nil
}))
// 3. Start HTTP server
server, _ := healthcheck.NewServer(hc, healthcheck.WithPort(8080))
_ = server.Run(ctx)
// 4. Check health at http://localhost:8080/ready
select {}
}
Types of Health Checks
1. Basic Checks (Synchronous)
Basic checks run on-demand when the /ready endpoint is called. Use these for:
- Fast operations (< 1 second)
- Checks that need fresh data
- Low-cost operations
// Database connectivity check
dbCheck := healthcheck.NewBasic("postgres", time.Second, func (ctx context.Context) error {
return db.PingContext(ctx)
})
2. Background Checks (Asynchronous)
Background checks run periodically in a separate goroutine (in background mode). Use these for:
- Expensive operations (API calls, complex queries)
- Checks with rate limits (when checks running rarely than k8s requests to
/ready)
- Operations that can use slightly stale data
// External API health check - runs every 30 seconds
apiCheck := healthcheck.NewBackground(
"payment-api",
nil, // initial error state
5*time.Second, // initial delay
30*time.Second, // check interval
5*time.Second, // timeout per check
func (ctx context.Context) error {
resp, err := client.Get("https://api.payment.com/health")
if err != nil {
return err
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
return errors.New("unhealthy")
}
return nil
},
)
3. Manual Checks
Manual checks are controlled by your application logic. Use these for:
- Initialization states (cache warming, data loading)
- Circuit breaker patterns
- Feature flags
// Cache warming check
cacheCheck := healthcheck.NewManual("cache-warmed")
hc.Register(ctx, cacheCheck)
// Set unhealthy during startup
cacheCheck.SetErr(errors.New("cache warming in progress"))
// After cache is warmed
cacheCheck.SetErr(nil)
Best Practices
1. Choose the Right Check Type
| Scenario |
Check Type |
Why |
| Database ping |
Basic |
Fast, needs fresh data |
| File system check |
Basic |
Fast, local operation |
| External API health |
Background |
Expensive, rate-limited |
| Message queue depth |
Background |
Metrics query, can be stale |
| Cache warmup status |
Manual |
Application-controlled state |
2. Set Appropriate Timeouts
// ❌ Bad: Too long timeout blocks readiness. Timeout should less than timeout in k8s
healthcheck.NewBasic("db", 30*time.Second, checkFunc)
// ✅ Good: Short timeout
healthcheck.NewBasic("db", 1*time.Second, checkFunc)
3. Use Status Codes Correctly
4. Add Context to Errors
func checkDatabase(ctx context.Context) error {
if err := db.PingContext(ctx); err != nil {
// Use fmt.Errorf to add context. It will be available in /ready report
return fmt.Errorf("postgres connection failed: %w", err)
}
return nil
}
5. Graceful Shutdown
For applications that need to signal they are shutting down (preventing new traffic while completing existing requests),
use the Shutdown() method:
// Create healthcheck instance
hc, _ := healthcheck.New()
// Register your normal checks
hc.Register(ctx, healthcheck.NewBasic("database", time.Second, checkDB))
// Start HTTP server
server, _ := healthcheck.NewServer(hc, healthcheck.WithPort(8080))
_ = server.Run(ctx)
// In your graceful shutdown handler
func gracefulShutdown(hc *healthcheck.Healthcheck) {
// Mark application as shutting down - /ready will return 500
hc.Shutdown()
// Continue with your normal shutdown process
// - Stop accepting new requests
// - Complete existing requests
// - Close database connections, etc.
}
What happens after Shutdown():
/ready endpoint immediately returns HTTP 500 with status "down"
- A special
__shutting_down__ check is added to the response
- Kubernetes will stop routing new traffic to this pod
/live endpoint continues to return 200 OK (pod should not be restarted)
Use this pattern for:
- Zero-downtime deployments
- Graceful pod termination in Kubernetes
- Maintenance mode activation
- When you need to drain traffic before shutdown
6. Monitor Checks
hc, _ := healthcheck.New(
healthcheck.WithCheckStatusHook(func (name string, status healthcheck.Status) {
// hcMetric can be a prometheus metric - it is up to your infrastructure
hcMetric.WithLabelValues(name, string(status)).Set(1)
}),
)
Complete Example
package main
import (
"context"
"database/sql"
"fmt"
"log"
"time"
"github.com/kazhuravlev/healthcheck"
_ "github.com/lib/pq"
)
func main() {
ctx := context.Background()
// Initialize dependencies
db, err := sql.Open("postgres", "postgres://localhost/myapp")
if err != nil {
log.Fatal(err)
}
// Create healthcheck
hc, _ := healthcheck.New()
// 1. Database check - synchronous, critical
hc.Register(ctx, healthcheck.NewBasic("postgres", time.Second, func(ctx context.Context) error {
return db.PingContext(ctx)
}))
// 2. Cache warmup - manual control
cacheReady := healthcheck.NewManual("cache")
hc.Register(ctx, cacheReady)
cacheReady.SetErr(fmt.Errorf("warming up"))
// 3. External API - background check
hc.Register(ctx, healthcheck.NewBackground(
"payment-provider",
nil,
10*time.Second, // initial delay
30*time.Second, // check interval
5*time.Second, // timeout
checkPaymentProvider,
))
// Start health check server
server, _ := healthcheck.NewServer(hc, healthcheck.WithPort(8080))
if err := server.Run(ctx); err != nil {
log.Fatal(err)
}
// Simulate cache warmup completion
go func() {
time.Sleep(5 * time.Second)
cacheReady.SetErr(nil)
log.Println("Cache warmed up")
}()
// Graceful shutdown example
go func() {
time.Sleep(30 * time.Second)
log.Println("Initiating graceful shutdown...")
hc.Shutdown() // /ready will now return 500, stopping new traffic
log.Println("Application marked as shutting down")
}()
log.Println("Health checks available at:")
log.Println(" - http://localhost:8080/live")
log.Println(" - http://localhost:8080/ready")
select {}
}
func checkPaymentProvider(ctx context.Context) error {
// Implementation of payment provider check
return nil
}
Integration with Kubernetes
apiVersion: v1
kind: Pod
spec:
containers:
- name: app
livenessProbe:
httpGet:
path: /live
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2
The /ready endpoint returns detailed JSON with check history:
Healthy application:
{
"status": "up",
"checks": [
{
"name": "postgres",
"state": {
"status": "up",
"error": "",
"timestamp": "2024-01-15T10:30:00Z"
},
"history": [
{
"status": "up",
"error": "",
"timestamp": "2024-01-15T10:29:55Z"
}
]
}
]
}
Application shutting down:
{
"status": "down",
"checks": [
{
"name": "postgres",
"state": {
"status": "up",
"error": "",
"timestamp": "2024-01-15T10:30:00Z"
}
},
{
"name": "__shutting_down__",
"state": {
"status": "down",
"error": "The application in shutting down process",
"timestamp": "2024-01-15T10:30:05Z"
},
"history": null
}
]
}