Managing servers

Monitoring & alerts

What the background monitor checks, how alerts reach you in Telegram, and per-user preferences for mutes and quiet hours.

What runs every 5 minutes

The background monitor loops over every server in the database and runs four checks:

Consecutive failure threshold

If the monitor itself can't reach a server (SSH error, timeout, etc.), it waits for 3 consecutive failures before firing a monitor_error alert. This prevents flapping on a server that's restarting or on a jittery network.

Watchdog

A separate timer runs every minute and checks when the monitor last completed a full cycle. If it's been more than 15 minutes, it bypasses the normal alert pipeline and sends a direct Telegram message to all super-admins: "EXMER monitor is STALE — last completed N min ago". This catches hangs that the monitor can't notice itself.

Who gets alerts

For each fired alert, the recipient list is built from:

  1. The server's owner
  2. Admin members (not viewers) from server_access
  3. All super-admins from ADMIN_USER_IDS

Each recipient then passes through their personal notification preferences.

Notification preferences

Every authenticated user can mute individual alert types and set a quiet-hours window. Defaults: everything on, no quiet hours. Managed at PUT /api/me/notification-prefs. Supports wraparound quiet hours (e.g. 22:00–07:00 for overnight).

Alert typeFires whenSeverity
gateway_downopenclaw health returns non-zerocritical
connection_failedSSH connection to server failscritical
token_overflowSession tokens above 80%warning
disk_fullRoot disk above 90%warning
api_errorsProvider errors in gateway logswarning / critical
monitor_error3 consecutive monitor failurescritical

Alert storage

Every fired alert is also stored in the alerts table with read/unread status. The Alerts page in the Mini App shows the full history.