Observability Without Compromising Security

Observability and security are often in tension. You want to know everything about your system’s behavior, but every data collection surface is also an attack surface. BeaconSSH resolves this by physically isolating diagnostics from the security path.

This post covers why that isolation exists and how the diagnostic server works.

The Problem with Observability in Security Systems

Telemetry and logging systems tend to have permissive input handling. They accept data from many sources, parse varied formats, and store large volumes. This makes them attractive targets:

Data exfiltration — diagnostic endpoints that accept arbitrary payloads can be used to exfiltrate data if the server is compromised
Injection via telemetry — if the diagnostic server shares infrastructure with the auth server, a crafted telemetry payload might reach authentication logic
Availability impact — a spike in telemetry writes could degrade the certificate server if they share database connections

In most systems, the monitoring layer has some access to the system being monitored. In BeaconSSH, it has none.

Full Isolation

The diagnostic server is a separate service — separate process, separate database, separate network path. It has:

No access to CA keys (private or public)
No access to user records or authorization policies
No access to the authentication flow
No ability to issue, revoke, or modify certificates

Even if the diagnostic server is fully compromised, the attacker gains access to telemetry data. They cannot use it to escalate to SSH access. There is no lateral movement path from diagnostics to the certificate server.

What Gets Collected

The diagnostic server receives opt-in telemetry from clients:

Certificate issuance latency — time from request to signed certificate
Error events — auth failures, network timeouts, certificate validation errors
Client environment — OS, SSH client version, agent availability
Connection success/failure rates — did the certificate actually work?

All payloads are:

Compressed with Brotli to minimize bandwidth
Signed by the client to ensure integrity — the server rejects unsigned or tampered payloads
Schema-validated before processing — unexpected fields or formats are rejected
Size-limited — payloads exceeding a threshold are dropped to prevent storage abuse

One-Way Data Flow

The diagnostic server can read data sent to it. It cannot send commands back to clients or to the certificate server. It cannot call any mutating API on the certificate server.

The only outbound action the diagnostic server takes: health monitoring of the certificate server via a public health endpoint, with optional email alerts to administrators. This is a read-only HTTP check — no authentication, no mutation.

Clients ──telemetry──▶ Diagnostic Server ──health check──▶ Cert Server (read-only)
                              │
                              ▼
                         Alerts (email)

There is no return path. The certificate server doesn’t know the diagnostic server exists.

Aggregation and Insights

Raw telemetry is aggregated into operational insights:

Latency percentiles over time — is certificate issuance getting slower?
Error rate trends — are OAuth failures increasing? (might indicate provider issues)
Client version distribution — are users on outdated CLIs?
Geographic patterns — are certain regions experiencing higher failure rates?

These aggregations are exposed through the admin dashboard as read-only views. They inform operational decisions but don’t drive automated actions — no auto-scaling, no auto-remediation, no policy changes based on telemetry.

Why Not Just Use Prometheus / Grafana?

You could expose metrics from the certificate server and scrape them with Prometheus. Many systems do this. But:

Prometheus scrapes introduce a network path to the certificate server
Metric endpoints can leak sensitive information if not carefully scoped
The scraper becomes a dependency — if it misbehaves, it could impact the certificate server
Alert evaluation happens outside the system, creating another component with potential access

BeaconSSH’s approach pushes all diagnostic data outward. The certificate server emits nothing. Clients send telemetry to a completely separate system. The security boundary is clean.

The tradeoff: you lose some operational convenience. You can’t just curl /metrics on the certificate server. You get system-level metrics (CPU, memory, request count) from standard infrastructure monitoring, and application-level insights from the diagnostic server. Two systems instead of one. That’s the cost of isolation.

Summary

The diagnostic server exists because monitoring a security-critical system without compromising it requires physical separation. Shared infrastructure between telemetry and authentication is a latent vulnerability. By making the diagnostic server a fully isolated, one-way data sink with no access to the security path, BeaconSSH ensures that observability cannot become an attack vector.

This is the final post in the BeaconSSH series. The system is in active development — future posts will cover hardware key support, multi-provider identity federation, and the operational experience of running a short-lived certificate authority.