Golang Error Handling Code Review Checklist
Core Principles
✅ Errors as Values
- Errors are treated as values, not control flow mechanisms
- No panics for expected failures or recoverable conditions
- Error handling follows Go's idiomatic
if err != nilpattern
✅ Error Propagation Strategy
✅ Production Reality: Different layers have different logging responsibilities
Three-Layer Approach:
- Domain/Business Logic: Return enriched errors without logging (focus on correctness)
- Service/Controller Layer: Log AND return for operational visibility (production requirement)
- Transport Layer: Always log with full request context (debugging essential)
Error Context Requirements:
- Errors are wrapped with context using
fmt.Errorf("operation: %w", err) - Error messages include operation context and relevant identifiers
- Include correlation IDs (request IDs, resource names, namespaces)
- No secrets or PII in error messages
// ✅ Domain/Business Logic - return enriched errors only
func (s *Service) Put(ctx context.Context, key string, value interface{}) error {
if err := s.validateInput(key, value); err != nil {
return fmt.Errorf("validate input for key %q: %w", key, err)
}
if err := s.storage.Store(key, value); err != nil {
return fmt.Errorf("store key %q: %w", key, err)
}
return nil
}
// ✅ Controller/Service Layer - LOG AND RETURN for operational visibility
func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx).WithValues("resource", req.NamespacedName)
if err := r.service.Put(ctx, req.Name, data); err != nil {
logger.Error(err, "failed to store resource",
"operation", "reconcile_store",
"namespace", req.Namespace,
"name", req.Name)
return ctrl.Result{RequeueAfter: time.Minute}, err // Log AND return
}
logger.Info("resource stored successfully")
return ctrl.Result{}, nil
}
// ✅ Transport Layer - always log with full context
func (h *HTTPHandler) UpdateResource(w http.ResponseWriter, r *http.Request) {
requestID := middleware.GetRequestID(r.Context())
logger := log.WithFields(log.Fields{"request_id": requestID})
err := h.controller.Reconcile(r.Context(), req)
if err != nil {
logger.Error("request failed",
"error", err,
"method", r.Method,
"path", r.URL.Path,
"request_id", requestID)
http.Error(w, "internal server error", 500)
return
}
}
Error Classification
✅ Retryable vs Non-Retryable Error Classification
Classify errors to determine if operations should be retried automatically:
Retryable Errors (safe to retry automatically):
- Temporary network failures (connection timeouts, DNS resolution failures)
- Service unavailable errors (HTTP 503, temporary database connection issues)
- Rate limiting errors (HTTP 429)
- Transient infrastructure failures
Non-Retryable Errors (should not be retried):
- Input validation failures (malformed JSON, invalid parameters)
- Authentication/authorization failures (HTTP 401, 403)
- Resource not found errors (HTTP 404)
- Business logic violations (insufficient funds, duplicate records)
Implementation Guidelines:
- Error types should implement interfaces for classification
- Use specific error types rather than string matching
- Document retry behavior clearly in API specifications
References:
type RetryableError struct {
Cause error
After time.Duration
}
func (e *RetryableError) Error() string {
return fmt.Sprintf("retryable error: %v", e.Cause)
}
func (e *RetryableError) Unwrap() error {
return e.Cause
}
Error Context and Messages
✅ Meaningful Error Messages
- Include operation being performed
- Include relevant identifiers (IDs, names, keys)
- Provide enough context for debugging
- Follow consistent format:
"operation identifier: cause"
// ✅ Good
return fmt.Errorf("failed to update user %q in database: %w", userID, err)
return fmt.Errorf("invalid configuration for service %q: missing required field 'endpoint'", serviceName)
// ❌ Bad
return fmt.Errorf("error: %w", err)
return errors.New("something went wrong")
✅ Security Considerations
- Never include secrets, passwords, or tokens in error messages
- Never include PII (personally identifiable information)
- Sanitize user input before including in error messages
- Use generic identifiers where possible
// ✅ Good
return fmt.Errorf("authentication failed for user %q", userID)
// ❌ Bad
return fmt.Errorf("authentication failed: invalid password %q", password)
return fmt.Errorf("failed to process email %q", email) // PII
Error Handling Patterns
✅ Strategic Error Logging
✅ Production Reality: Operational visibility often requires logging at multiple levels
Preferred Pattern - Three-Layer Strategy:
- Domain Layer: Return enriched errors, no logging (pure business logic)
- Service Layer: Log significant business events + return errors (operational boundaries)
- Transport Layer: Always log with full request context (HTTP handlers, controllers)
When to Log AND Return (Production Best Practice):
- Controllers/Reconcilers: Always log errors for operational debugging
- Service boundaries: Where business logic meets infrastructure
- External integrations: API calls, database operations, message queues
- Resource operations: Kubernetes API calls, file I/O, network operations
- Critical paths: Where error loss would impact system reliability
Consistent Log-and-Return Pattern (REQUIRED for PR reviews):
// ✅ REQUIRED Pattern - Use this exact structure
if err := operation(ctx, resource); err != nil {
logger.Error("failed to [operation]",
[zap.Error(err) OR err], // Use zap.Error(err) for zap, just err for logr
"[logger_type]", "[operation_name]",
"namespace", resource.Namespace,
"name", resource.Name)
return result, fmt.Errorf("[operation] [resource_type] %s/%s: %w",
resource.Namespace, resource.Name, err)
}
Standard Pattern Example:
if err := r.createResource(ctx, resource); err != nil {
logger.Error(err, "failed to create resource",
"operation", "create_resource",
"namespace", req.Namespace,
"name", req.Name)
return res, fmt.Errorf("create resource %q: %w", req.NamespacedName, err)
}
Implementation Guidelines:
- Include correlation IDs (request IDs, user IDs, resource identifiers)
- Use structured logging with relevant context
- Classify errors: validation (don't log) vs system errors (always log)
- Choose appropriate log levels (ERROR for actionable issues)
- ALWAYS follow the exact log-and-return pattern above for consistency
PR Review Checklist
Use this checklist to ensure consistency across all controllers and services
✅ Required Pattern for Controllers/Services:
Every error in controllers MUST follow this exact pattern:
-
Log the error with structured fields:
- Error message starting with "failed to [operation]"
"operation"field with operation name"namespace"and"name"fields for resource identification- Use
zap.Error(err)for zap logger OR justerrfor logr
-
Return wrapped error with context:
- Use
fmt.Errorfwith%wverb - Include operation and resource identification
- Format:
"[operation] [resource_type] %s/%s: %w"
- Use
-
Resource identification in both log and error:
- Use
req.Namespaceandreq.Namefor request context - Use
resource.Namespaceandresource.Namefor resource context
- Use
❌ PR Review Red Flags:
- Error returned without logging in controllers
- Logged error without returning
- Missing operation context in logs
- Missing namespace/name in structured logging
- Inconsistent error message format
- Using different patterns across similar operations
Enforce this pattern in ALL future PRs for operational consistency!
// ✅ Good - log at boundary (Kubernetes/controller-runtime pattern)
func (h *Handler) CreateUser(w http.ResponseWriter, r *http.Request) {
user, err := h.service.CreateUser(ctx, req)
if err != nil {
logger.Error("failed to create user",
"correlation_id", correlationID,
"user_id", req.UserID,
"error", err)
http.Error(w, "internal server error", 500)
return
}
}
// Service layer - don't log, just return (matches Kubernetes business logic)
func (s *Service) CreateUser(ctx context.Context, req *CreateUserRequest) (*User, error) {
if err := s.repo.Save(ctx, user); err != nil {
return nil, fmt.Errorf("save user %q: %w", user.ID, err) // No logging here
}
}
// ✅ Real Kubernetes boundary logging example:
func (rsc *ReplicaSetController) processNextWorkItem(ctx context.Context) bool {
err := rsc.syncHandler(ctx, key) // calls business logic
if err != nil {
utilruntime.HandleError(fmt.Errorf("sync %q failed with %v", key, err)) // LOG HERE
rsc.queue.AddRateLimited(key)
}
return true
}
✅ Error Wrapping Strategy
When to wrap vs when to return directly
Error Wrapping Guidelines:
- Controllers/Services: Always wrap errors with operation and resource context
- Business Logic: Wrap when crossing architectural boundaries
- Utility Functions: Usually return directly (context obvious or will be wrapped higher)
- External Calls: Always wrap with operation context
Error Wrapping Benefits:
- Debugging Speed: "create spark job ml-team/training: connection refused" vs "connection refused"
- Resource Identification: Know exactly which resource failed in busy clusters
- Operation Context: Understand the call chain that led to failure
When to Use Each Pattern:
✅ Wrap Errors (for operational context):
// Controllers - always wrap with resource context
func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) error {
if err := r.createResource(ctx, resource); err != nil {
return fmt.Errorf("reconcile %s %s: %w", resource.Kind, req.NamespacedName, err)
}
}
// Services - wrap with business context
func (s *Service) ProcessOrder(orderID string) error {
if err := s.validateOrder(orderID); err != nil {
return fmt.Errorf("process order %s: %w", orderID, err)
}
}
// External integrations - wrap with operation context
func (c *Client) CreateSparkJob(job *SparkJob) error {
if err := c.httpClient.Post(url, job); err != nil {
return fmt.Errorf("create spark job %s/%s: %w", job.Namespace, job.Name, err)
}
}
✅ Direct Return (when context is obvious):
// Validation functions
func validateEmail(email string) error {
if !emailRegex.MatchString(email) {
return ErrInvalidEmail // Context is obvious
}
}
// Simple getters (Kubernetes errors already have context)
func (r *Reconciler) getResource(ctx context.Context, name string) (*Resource, error) {
resource := &Resource{}
err := r.Get(ctx, types.NamespacedName{Name: name}, resource)
return resource, err // K8s error already includes resource info
}
// Internal utilities (will be wrapped by caller)
func connectDatabase(url string) (*sql.DB, error) {
return sql.Open("postgres", url) // Business layer will wrap
}
Implementation Requirements:
- Use
fmt.Errorfwith%wverb for error wrapping - Preserve original error for
errors.Is()anderrors.As()checks - Include resource identifiers (namespace/name) in controller operations
- Add operation context ("create", "update", "delete", "get")
- Don't wrap nil errors
// ✅ Good
func (s *Service) ProcessOrder(ctx context.Context, orderID string) error {
order, err := s.repo.GetOrder(ctx, orderID)
if err != nil {
return fmt.Errorf("get order %q: %w", orderID, err)
}
if err := s.payment.Charge(ctx, order); err != nil {
return fmt.Errorf("charge payment for order %q: %w", orderID, err)
}
return nil
}
Anti-Patterns to Avoid
❌ These patterns are NOT found in Kubernetes, Controller-Runtime, or major Go projects
❌ Common Mistakes in Error Handling
- Logging without context - Generic error messages without correlation IDs
- Inconsistent logging strategy - Some errors logged, others not, no clear pattern
- Double-wrapping errors - Wrapping already wrapped errors unnecessarily
- Swallowing errors - Ignoring errors with
_ = err - Generic error messages - "something went wrong", "error occurred"
- Exposing internal details - Returning database errors directly to API clients
- Missing operational context - Not logging enough detail for debugging production issues
// ❌ Bad - logging without context
if err != nil {
log.Error("error occurred", err) // No correlation ID, operation, or resource info
return fmt.Errorf("operation failed: %w", err)
}
// ❌ Bad - swallowing errors
result, _ := someOperation() // Error information lost
// ❌ Bad - no context in error messages
return err // No operation context, resource identifiers
// ❌ Bad - inconsistent logging (some errors logged, some not)
func (r *Controller) Reconcile(ctx context.Context, req ctrl.Request) error {
if err := r.Get(ctx, req.NamespacedName, &obj); err != nil {
return err // Not logged - might be lost
}
if err := r.Update(ctx, obj); err != nil {
logger.Error("update failed", err) // This one is logged
return err
}
}
// ✅ Good - consistent operational logging
func (r *Controller) Reconcile(ctx context.Context, req ctrl.Request) error {
logger := log.FromContext(ctx).WithValues("resource", req.NamespacedName)
if err := r.Get(ctx, req.NamespacedName, &obj); err != nil {
logger.Error(err, "failed to get resource", "operation", "get")
return fmt.Errorf("get resource %q: %w", req.NamespacedName, err)
}
if err := r.Update(ctx, obj); err != nil {
logger.Error(err, "failed to update resource", "operation", "update")
return fmt.Errorf("update resource %q: %w", req.NamespacedName, err)
}
return nil
}
Testing Error Handling
✅ Error Testing
- Test both success and error paths
- Verify error messages contain expected context
- Test error wrapping and unwrapping with
errors.Is()anderrors.As() - Mock dependencies to simulate different error conditions
func TestService_ProcessOrder_PaymentFailure(t *testing.T) {
// Setup mocks
paymentErr := errors.New("payment declined")
mockPayment.EXPECT().Charge(gomock.Any(), gomock.Any()).Return(paymentErr)
// Test
err := service.ProcessOrder(ctx, "order-123")
// Verify
assert.Error(t, err)
assert.Contains(t, err.Error(), "charge payment for order \"order-123\"")
assert.True(t, errors.Is(err, paymentErr))
}
Performance Considerations
✅ Error Performance
- Avoid creating errors in hot paths when not needed
- Use pre-allocated sentinel errors for common cases
- Consider error pooling for high-frequency operations
- Don't format expensive strings unless error will be returned
// ✅ Good - sentinel errors
var (
ErrUserNotFound = errors.New("user not found")
ErrInvalidInput = errors.New("invalid input")
)
// ✅ Good - conditional formatting
func validateInput(input string) error {
if len(input) == 0 {
return ErrInvalidInput // Pre-allocated
}
if len(input) > maxLength {
return fmt.Errorf("input too long: %d > %d", len(input), maxLength)
}
return nil
}