Monitoring
CT-Ops collects system metrics from every agent and lets you define health checks that run on a schedule. Metrics are stored in TimescaleDB and visualised as time-series charts.
Metrics Collection
The agent sends the following vitals with every heartbeat:
| Metric | Unit | Description |
|---|---|---|
cpu_percent | % | CPU utilisation across all cores |
memory_percent | % | RAM utilisation |
disk_percent | % | Root filesystem utilisation |
uptime_seconds | seconds | Host uptime since last boot |
Future versions will add per-disk, per-NIC, and per-process metrics.
Metric Charts
Charts are available on the Metrics tab of each host detail page. Each chart is interactive:
- Time range — 1 hour, 6 hours, 24 hours, 7 days, 30 days
- Zoom — click and drag on any chart to zoom into a specific window
- Smart bucketing — data is automatically aggregated to the appropriate resolution based on the selected time range, using TimescaleDB continuous aggregates
Health Checks
Health checks run on the agent and report a pass/fail result back to the ingest service. Results are stored and can trigger alert rules.
Check types
| Type | What it checks |
|---|---|
port | TCP/UDP connectivity to a host:port |
process | Whether a named process is running |
http | HTTP endpoint reachability and optional status code check |
Configuring checks
Checks are configured per-host from the Checks tab of the host detail page:
- Click Add Check
- Select the check type
- Fill in the parameters (target address, port, expected status code, etc.)
- Set the check interval
- Click Save
The check is pushed to the agent on the next heartbeat and starts running immediately.
TimescaleDB Storage
Raw metrics are written to a TimescaleDB hypertable partitioned by time. Three continuous aggregates are pre-configured:
| Aggregate | Bucket size | Retention |
|---|---|---|
metrics_1m | 1 minute | 7 days |
metrics_1h | 1 hour | 90 days |
metrics_1d | 1 day | 2 years |
The appropriate aggregate is selected automatically based on the chart's time range.
Alerting on Metrics
You can create alert rules that fire when a metric crosses a threshold — for example, CPU > 90% for 5 minutes. See Alerts for details.