BeaconSSH Architecture Deep Dive
The previous post explained the problem: a web application that needs to SSH into remote machines on behalf of users without storing credentials. The answer was SSH certificates. This post is about the system that issues them — and everything else around it.
Design Principles
Four principles shaped every decision:
-
The system is not in the SSH data path. BeaconSSH issues certificates. It doesn’t proxy connections. When a user SSHs into a host, BeaconSSH is not involved.
-
Stateless authentication. The certificate server does not maintain sessions. Every request is authenticated independently against the OAuth provider. No session token to steal.
-
Offline enforcement. Hosts validate certificates using the CA’s public key locally. They never call back to the certificate server during an SSH connection.
-
Separation of concerns. No component reaches outside its boundary. Identity stays in the CLI. Authorization stays in the certificate server. Trust distribution stays in the host CLI.
The Six Components
┌─────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ User CLI │────▶│ Certificate │◀────│ Host CLI │
│ (identity, │ │ Server │ │ (trust │
│ key mgmt) │ │ (CA, authz, │ │ distribution) │
└─────────────┘ │ audit) │ └─────────────────┘
└──────┬───────────┘
│
┌────────────┼────────────┐
│ │ │
┌────────▼──┐ ┌──────▼─────┐ ┌──▼──────────┐
│ Diagnostic │ │ Web Portal │ │ Admin │
│ Server │ │ (users) │ │ Dashboard │
└────────────┘ └────────────┘ └─────────────┘
Certificate Server — The Core
The certificate server is stateless per request. It receives a signing request containing an OAuth access token, a public key, requested principals, and an optional target host group.
For each request it: validates the OAuth token by calling the provider (no caching — a revoked token is immediately ineffective), maps identity to an internal user in PostgreSQL, evaluates authorization policy (explicit allow-lists, no implicit access), signs the certificate with the active CA key, and records the event for audit.
Why no token caching? Because token revocation must be immediate. A cache with a 5-minute TTL means a 5-minute window of unauthorized access. For SSH access to production infrastructure, that window is unacceptable.
Redis is used optionally to track active certificates by fingerprint using TTL-based entries. If Redis is down, certificates are still issued — PostgreSQL audit log is the source of truth.
User CLI — Identity and Key Management
Handles OAuth device authentication, generates and manages a persistent Ed25519 SSH key pair (private key always encrypted), requests certificates, and manages ssh-agent integration. Keys are loaded with a lifetime constraint matching certificate validity — the agent drops them automatically on expiry.
Host CLI — Trust Distribution
Intentionally the simplest component. Not a daemon — runs via cron. Each run: fetches the CA public key bundle over HTTPS, verifies the GPG signature, checks monotonic version numbers to prevent rollback, and atomically writes to TrustedUserCAKeys. Never handles private keys.
Diagnostic Server — Isolated Observability
Physically and logically separated from the certificate server. Receives opt-in telemetry, validates schemas, enforces size limits. Cannot affect certificate issuance or authorization even if compromised.
Web Portal and Admin Dashboard
The portal gives users a read-only view of their certificate history. The admin dashboard provides operational control — user management, policies, CA lifecycle. All actions are logged as immutable audit events. Neither can bypass policy to directly grant certificates.
The Trust Model
| Component | Trusts | Does Not Trust |
|---|---|---|
| User CLI | OAuth provider, certificate server | The host, the network |
| Certificate server | OAuth provider, PostgreSQL | The CLI, the host CLI |
| Host CLI | GPG signing key, HTTPS endpoint | Certificate server runtime |
| SSH host (sshd) | CA public key on local disk | Everything else |
The critical insight: sshd trusts exactly one thing — the CA public key file on disk. This is what makes the system resilient.
What’s Next
The next post covers how the certificate server achieves stateless operation at scale — and why that matters more than it seems.