Skip to main content

Golang Error Handling Code Review Checklist

Core Principles

✅ Errors as Values

  • Errors are treated as values, not control flow mechanisms
  • No panics for expected failures or recoverable conditions
  • Error handling follows Go's idiomatic if err != nil pattern

✅ Error Propagation Strategy

✅ Production Reality: Different layers have different logging responsibilities

Three-Layer Approach:

  • Domain/Business Logic: Return enriched errors without logging (focus on correctness)
  • Service/Controller Layer: Log AND return for operational visibility (production requirement)
  • Transport Layer: Always log with full request context (debugging essential)

Error Context Requirements:

  • Errors are wrapped with context using fmt.Errorf("operation: %w", err)
  • Error messages include operation context and relevant identifiers
  • Include correlation IDs (request IDs, resource names, namespaces)
  • No secrets or PII in error messages
// ✅ Domain/Business Logic - return enriched errors only
func (s *Service) Put(ctx context.Context, key string, value interface{}) error {
if err := s.validateInput(key, value); err != nil {
return fmt.Errorf("validate input for key %q: %w", key, err)
}

if err := s.storage.Store(key, value); err != nil {
return fmt.Errorf("store key %q: %w", key, err)
}

return nil
}

// ✅ Controller/Service Layer - LOG AND RETURN for operational visibility
func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx).WithValues("resource", req.NamespacedName)

if err := r.service.Put(ctx, req.Name, data); err != nil {
logger.Error(err, "failed to store resource",
"operation", "reconcile_store",
"namespace", req.Namespace,
"name", req.Name)
return ctrl.Result{RequeueAfter: time.Minute}, err // Log AND return
}

logger.Info("resource stored successfully")
return ctrl.Result{}, nil
}

// ✅ Transport Layer - always log with full context
func (h *HTTPHandler) UpdateResource(w http.ResponseWriter, r *http.Request) {
requestID := middleware.GetRequestID(r.Context())
logger := log.WithFields(log.Fields{"request_id": requestID})

err := h.controller.Reconcile(r.Context(), req)
if err != nil {
logger.Error("request failed",
"error", err,
"method", r.Method,
"path", r.URL.Path,
"request_id", requestID)
http.Error(w, "internal server error", 500)
return
}
}

Error Classification

✅ Retryable vs Non-Retryable Error Classification

Classify errors to determine if operations should be retried automatically:

Retryable Errors (safe to retry automatically):

  • Temporary network failures (connection timeouts, DNS resolution failures)
  • Service unavailable errors (HTTP 503, temporary database connection issues)
  • Rate limiting errors (HTTP 429)
  • Transient infrastructure failures

Non-Retryable Errors (should not be retried):

  • Input validation failures (malformed JSON, invalid parameters)
  • Authentication/authorization failures (HTTP 401, 403)
  • Resource not found errors (HTTP 404)
  • Business logic violations (insufficient funds, duplicate records)

Implementation Guidelines:

  • Error types should implement interfaces for classification
  • Use specific error types rather than string matching
  • Document retry behavior clearly in API specifications

References:

type RetryableError struct {
Cause error
After time.Duration
}

func (e *RetryableError) Error() string {
return fmt.Sprintf("retryable error: %v", e.Cause)
}

func (e *RetryableError) Unwrap() error {
return e.Cause
}

Error Context and Messages

✅ Meaningful Error Messages

  • Include operation being performed
  • Include relevant identifiers (IDs, names, keys)
  • Provide enough context for debugging
  • Follow consistent format: "operation identifier: cause"
// ✅ Good
return fmt.Errorf("failed to update user %q in database: %w", userID, err)
return fmt.Errorf("invalid configuration for service %q: missing required field 'endpoint'", serviceName)

// ❌ Bad
return fmt.Errorf("error: %w", err)
return errors.New("something went wrong")

✅ Security Considerations

  • Never include secrets, passwords, or tokens in error messages
  • Never include PII (personally identifiable information)
  • Sanitize user input before including in error messages
  • Use generic identifiers where possible
// ✅ Good
return fmt.Errorf("authentication failed for user %q", userID)

// ❌ Bad
return fmt.Errorf("authentication failed: invalid password %q", password)
return fmt.Errorf("failed to process email %q", email) // PII

Error Handling Patterns

✅ Strategic Error Logging

✅ Production Reality: Operational visibility often requires logging at multiple levels

Preferred Pattern - Three-Layer Strategy:

  • Domain Layer: Return enriched errors, no logging (pure business logic)
  • Service Layer: Log significant business events + return errors (operational boundaries)
  • Transport Layer: Always log with full request context (HTTP handlers, controllers)

When to Log AND Return (Production Best Practice):

  • Controllers/Reconcilers: Always log errors for operational debugging
  • Service boundaries: Where business logic meets infrastructure
  • External integrations: API calls, database operations, message queues
  • Resource operations: Kubernetes API calls, file I/O, network operations
  • Critical paths: Where error loss would impact system reliability

Consistent Log-and-Return Pattern (REQUIRED for PR reviews):

// ✅ REQUIRED Pattern - Use this exact structure
if err := operation(ctx, resource); err != nil {
logger.Error("failed to [operation]",
[zap.Error(err) OR err], // Use zap.Error(err) for zap, just err for logr
"[logger_type]", "[operation_name]",
"namespace", resource.Namespace,
"name", resource.Name)
return result, fmt.Errorf("[operation] [resource_type] %s/%s: %w",
resource.Namespace, resource.Name, err)
}

Standard Pattern Example:

if err := r.createResource(ctx, resource); err != nil {
logger.Error(err, "failed to create resource",
"operation", "create_resource",
"namespace", req.Namespace,
"name", req.Name)
return res, fmt.Errorf("create resource %q: %w", req.NamespacedName, err)
}

Implementation Guidelines:

  • Include correlation IDs (request IDs, user IDs, resource identifiers)
  • Use structured logging with relevant context
  • Classify errors: validation (don't log) vs system errors (always log)
  • Choose appropriate log levels (ERROR for actionable issues)
  • ALWAYS follow the exact log-and-return pattern above for consistency

PR Review Checklist

Use this checklist to ensure consistency across all controllers and services

Required Pattern for Controllers/Services:

Every error in controllers MUST follow this exact pattern:

  1. Log the error with structured fields:

    • Error message starting with "failed to [operation]"
    • "operation" field with operation name
    • "namespace" and "name" fields for resource identification
    • Use zap.Error(err) for zap logger OR just err for logr
  2. Return wrapped error with context:

    • Use fmt.Errorf with %w verb
    • Include operation and resource identification
    • Format: "[operation] [resource_type] %s/%s: %w"
  3. Resource identification in both log and error:

    • Use req.Namespace and req.Name for request context
    • Use resource.Namespace and resource.Name for resource context

PR Review Red Flags:

  • Error returned without logging in controllers
  • Logged error without returning
  • Missing operation context in logs
  • Missing namespace/name in structured logging
  • Inconsistent error message format
  • Using different patterns across similar operations

Enforce this pattern in ALL future PRs for operational consistency!

// ✅ Good - log at boundary (Kubernetes/controller-runtime pattern)
func (h *Handler) CreateUser(w http.ResponseWriter, r *http.Request) {
user, err := h.service.CreateUser(ctx, req)
if err != nil {
logger.Error("failed to create user",
"correlation_id", correlationID,
"user_id", req.UserID,
"error", err)
http.Error(w, "internal server error", 500)
return
}
}

// Service layer - don't log, just return (matches Kubernetes business logic)
func (s *Service) CreateUser(ctx context.Context, req *CreateUserRequest) (*User, error) {
if err := s.repo.Save(ctx, user); err != nil {
return nil, fmt.Errorf("save user %q: %w", user.ID, err) // No logging here
}
}

// ✅ Real Kubernetes boundary logging example:
func (rsc *ReplicaSetController) processNextWorkItem(ctx context.Context) bool {
err := rsc.syncHandler(ctx, key) // calls business logic
if err != nil {
utilruntime.HandleError(fmt.Errorf("sync %q failed with %v", key, err)) // LOG HERE
rsc.queue.AddRateLimited(key)
}
return true
}

✅ Error Wrapping Strategy

When to wrap vs when to return directly

Error Wrapping Guidelines:

  • Controllers/Services: Always wrap errors with operation and resource context
  • Business Logic: Wrap when crossing architectural boundaries
  • Utility Functions: Usually return directly (context obvious or will be wrapped higher)
  • External Calls: Always wrap with operation context

Error Wrapping Benefits:

  • Debugging Speed: "create spark job ml-team/training: connection refused" vs "connection refused"
  • Resource Identification: Know exactly which resource failed in busy clusters
  • Operation Context: Understand the call chain that led to failure

When to Use Each Pattern:

✅ Wrap Errors (for operational context):

// Controllers - always wrap with resource context
func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) error {
if err := r.createResource(ctx, resource); err != nil {
return fmt.Errorf("reconcile %s %s: %w", resource.Kind, req.NamespacedName, err)
}
}

// Services - wrap with business context
func (s *Service) ProcessOrder(orderID string) error {
if err := s.validateOrder(orderID); err != nil {
return fmt.Errorf("process order %s: %w", orderID, err)
}
}

// External integrations - wrap with operation context
func (c *Client) CreateSparkJob(job *SparkJob) error {
if err := c.httpClient.Post(url, job); err != nil {
return fmt.Errorf("create spark job %s/%s: %w", job.Namespace, job.Name, err)
}
}

✅ Direct Return (when context is obvious):

// Validation functions
func validateEmail(email string) error {
if !emailRegex.MatchString(email) {
return ErrInvalidEmail // Context is obvious
}
}

// Simple getters (Kubernetes errors already have context)
func (r *Reconciler) getResource(ctx context.Context, name string) (*Resource, error) {
resource := &Resource{}
err := r.Get(ctx, types.NamespacedName{Name: name}, resource)
return resource, err // K8s error already includes resource info
}

// Internal utilities (will be wrapped by caller)
func connectDatabase(url string) (*sql.DB, error) {
return sql.Open("postgres", url) // Business layer will wrap
}

Implementation Requirements:

  • Use fmt.Errorf with %w verb for error wrapping
  • Preserve original error for errors.Is() and errors.As() checks
  • Include resource identifiers (namespace/name) in controller operations
  • Add operation context ("create", "update", "delete", "get")
  • Don't wrap nil errors
// ✅ Good
func (s *Service) ProcessOrder(ctx context.Context, orderID string) error {
order, err := s.repo.GetOrder(ctx, orderID)
if err != nil {
return fmt.Errorf("get order %q: %w", orderID, err)
}

if err := s.payment.Charge(ctx, order); err != nil {
return fmt.Errorf("charge payment for order %q: %w", orderID, err)
}

return nil
}

Anti-Patterns to Avoid

❌ These patterns are NOT found in Kubernetes, Controller-Runtime, or major Go projects

❌ Common Mistakes in Error Handling

  • Logging without context - Generic error messages without correlation IDs
  • Inconsistent logging strategy - Some errors logged, others not, no clear pattern
  • Double-wrapping errors - Wrapping already wrapped errors unnecessarily
  • Swallowing errors - Ignoring errors with _ = err
  • Generic error messages - "something went wrong", "error occurred"
  • Exposing internal details - Returning database errors directly to API clients
  • Missing operational context - Not logging enough detail for debugging production issues
// ❌ Bad - logging without context
if err != nil {
log.Error("error occurred", err) // No correlation ID, operation, or resource info
return fmt.Errorf("operation failed: %w", err)
}

// ❌ Bad - swallowing errors
result, _ := someOperation() // Error information lost

// ❌ Bad - no context in error messages
return err // No operation context, resource identifiers

// ❌ Bad - inconsistent logging (some errors logged, some not)
func (r *Controller) Reconcile(ctx context.Context, req ctrl.Request) error {
if err := r.Get(ctx, req.NamespacedName, &obj); err != nil {
return err // Not logged - might be lost
}

if err := r.Update(ctx, obj); err != nil {
logger.Error("update failed", err) // This one is logged
return err
}
}

// ✅ Good - consistent operational logging
func (r *Controller) Reconcile(ctx context.Context, req ctrl.Request) error {
logger := log.FromContext(ctx).WithValues("resource", req.NamespacedName)

if err := r.Get(ctx, req.NamespacedName, &obj); err != nil {
logger.Error(err, "failed to get resource", "operation", "get")
return fmt.Errorf("get resource %q: %w", req.NamespacedName, err)
}

if err := r.Update(ctx, obj); err != nil {
logger.Error(err, "failed to update resource", "operation", "update")
return fmt.Errorf("update resource %q: %w", req.NamespacedName, err)
}

return nil
}

Testing Error Handling

✅ Error Testing

  • Test both success and error paths
  • Verify error messages contain expected context
  • Test error wrapping and unwrapping with errors.Is() and errors.As()
  • Mock dependencies to simulate different error conditions
func TestService_ProcessOrder_PaymentFailure(t *testing.T) {
// Setup mocks
paymentErr := errors.New("payment declined")
mockPayment.EXPECT().Charge(gomock.Any(), gomock.Any()).Return(paymentErr)

// Test
err := service.ProcessOrder(ctx, "order-123")

// Verify
assert.Error(t, err)
assert.Contains(t, err.Error(), "charge payment for order \"order-123\"")
assert.True(t, errors.Is(err, paymentErr))
}

Performance Considerations

✅ Error Performance

  • Avoid creating errors in hot paths when not needed
  • Use pre-allocated sentinel errors for common cases
  • Consider error pooling for high-frequency operations
  • Don't format expensive strings unless error will be returned
// ✅ Good - sentinel errors
var (
ErrUserNotFound = errors.New("user not found")
ErrInvalidInput = errors.New("invalid input")
)

// ✅ Good - conditional formatting
func validateInput(input string) error {
if len(input) == 0 {
return ErrInvalidInput // Pre-allocated
}
if len(input) > maxLength {
return fmt.Errorf("input too long: %d > %d", len(input), maxLength)
}
return nil
}