02 — Health Checks¶
Prerequisite: 01 — HTTP Gateway You will need: Working setup from recipe 01, ability to kill the test MCP server Time: 10 minutes Adds: Automatic health monitoring with state transitions on failure
The Problem¶
Your MCP server from recipe 01 crashes at 3 AM. Hangar doesn't know. It keeps sending requests to a dead endpoint and forwards cryptic connection errors back to Claude. Claude retries. More errors. The on-call engineer gets paged because "AI is broken" — but nobody knows which MCP server is down until someone checks logs. Hangar could have told you in 30 seconds.
The Config¶
# config.yaml — Recipe 02: Health Checks
mcp_servers:
my-mcp:
mode: remote
endpoint: http://localhost:8080/mcp
description: "My remote MCP server"
health_check_interval_s: 30 # NEW: added in this recipe
max_consecutive_failures: 3 # NEW: added in this recipe
http:
connect_timeout: 10.0
read_timeout: 30.0
Save this as ~/.config/mcp-hangar/config.yaml (or update your existing file).
Try It¶
- Start Hangar with health checks enabled (background process)
Note the process ID. Health check worker starts automatically.
- Check initial status
cat > /tmp/check-status.sh << 'EOF'
#!/bin/bash
(
echo '{"jsonrpc":"2.0","method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}},"id":1}'
sleep 0.5
echo '{"jsonrpc":"2.0","method":"notifications/initialized","params":{}}'
sleep 0.5
echo '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"hangar_status","arguments":{}},"id":2}'
sleep 2
) | nc localhost 8765 2>/dev/null || echo '{"result":"Use stdio mode or check logs"}'
EOF
chmod +x /tmp/check-status.sh
# Alternative: Check logs
tail -5 /tmp/hangar.log
INFO mcp_server_state mcp_server_id=my-mcp state=cold
INFO mcp_registry_ready mcp_servers=['my-mcp']
- Trigger MCP server start
(
echo '{"jsonrpc":"2.0","method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}},"id":1}'
sleep 0.5
echo '{"jsonrpc":"2.0","method":"notifications/initialized","params":{}}'
sleep 0.5
echo '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"hangar_list","arguments":{}},"id":2}'
sleep 3
pkill -HUP -f "mcp-hangar"
) | mcp-hangar --config ~/.config/mcp-hangar/config.yaml serve 2>&1 | grep -E "ready|cold"
MCP Server transitioned to READY. Health checks now active.
- Simulate MCP server failure
Kill the subprocess MCP server directly (if using subprocess mode) or simulate network failure (if using remote mode):
Or forcefully fail the MCP server by breaking its process.
- Wait for health check cycle (40 seconds)
echo "Waiting for health checks to detect failure..."
sleep 40
tail -10 /tmp/hangar.log | grep -E "health_check|degraded"
WARNING health_check_failed mcp_server_id=my-mcp error=Connection refused
WARNING health_check_failed mcp_server_id=my-mcp error=Connection refused
WARNING health_check_failed mcp_server_id=my-mcp error=Connection refused
WARNING mcp_server_degraded_by_health_check mcp_server_id=my-mcp
- Verify DEGRADED state
# Check the process still running
ps aux | grep "mcp-hangar" | grep -v grep
# Check latest logs
tail -3 /tmp/hangar.log
- MCP Server will auto-recover if restarted
For subprocess mode, Hangar will attempt restart on next invocation. For remote mode, restart the remote server manually.
- Stop Hangar
What Just Happened¶
Hangar's background health check worker probes each READY MCP server every 60 seconds. The probe mechanism sends a tools/list JSON-RPC request to the MCP server with a 5-second timeout. If the MCP server responds successfully, the health check passes and consecutive_failures resets to 0.
When the MCP server fails to respond, Hangar records a failure in the HealthTracker. After max_consecutive_failures (3 by default) failed checks, the MCP server state transitions from READY to DEGRADED. This transition emits a McpServerDegraded domain event, which updates metrics and triggers alerts.
When the MCP server comes back online, Hangar reinitializes it (DEGRADED -> INITIALIZING -> READY). No manual intervention required -- Hangar detected the failure and recovery without human involvement.
State machine transitions:
- READY -> DEGRADED (after 3 consecutive failures)
- DEGRADED -> INITIALIZING -> READY (on reinitialize after recovery)
Key Config Reference¶
| Key | Type | Default | Description |
|---|---|---|---|
health_check_interval_s | int | 30 | Per-MCP server health check interval (seconds) |
max_consecutive_failures | int | 3 | Failures before state transition to DEGRADED |
What's Next¶
Your MCP server is monitored — but when it starts failing intermittently, every request during the 30-second health check interval still hits a broken server. You need automatic protection that trips immediately on the first failure, not after 3 health checks.