platform / internal tooling

The toolkit behind
the team.

We run our own platform on the same patterns we ship to clients: GitOps governance, dual-engine secret scanning, SOC observability, automated incident response. Each tool is config-driven and battle-tested in production before it ever reaches a client environment.

Book an intro call → See the stack ↓

caelicode/platform · live live

tools

uptime

99.99%

deploy cadence

daily

tools / 01-06

Six tools.
One operating model.

Every tool below is deployed via ArgoCD, tested in CI, monitored by the SOC stack, and documented with runbooks. When we ship these to client environments, the only change is the config values.

01 · status-page

Incident Automation

Synthetic checks, automated incident creation, status page updates, and structured postmortems.

test coverage216 tests

check interval30s

incident MTTR< 7m p50

Synthetic monitoring

HTTP, TCP, DNS, and TLS checks from 3 regions. Configurable thresholds and alert escalation.

Incident lifecycle

Auto-create, auto-notify, auto-resolve. PagerDuty and Slack integrations. Timeline generated automatically.

Postmortems

Structured templates, action item tracking, SLO impact scoring. Published internally within 5 business days.

Status page

Public-facing component status, maintenance windows, subscriber notifications. Zero manual updates during incidents.

02 · secret-scanner

Secret Scanning

Dual-engine scanning (gitleaks + trufflehog) with incremental mode and CI-blocking gates.

enginesgitleaks + trufflehog

custom rules22+

modeincremental, CI-blocking

Dual-engine approach

Two scanners with different detection strategies. One misses what the other catches. Combined false-positive rate under 2%.

Incremental scanning

Only scans new commits, not full history on every run. PR-blocking in under 8 seconds for typical diffs.

Custom rules

22+ org-specific patterns: internal tokens, service account keys, webhook URLs, config secrets.

Routing & SLAs

Findings routed to code owners. Critical: 4h SLA. High: 24h. Medium: 1 sprint. Low: best-effort.

03 · org-governance

Org Governance

YAML-driven GitHub org policy with nightly drift detection and auto-remediation.

run cadencenightly

drift detection< 5s

policy formatyaml declarative

Declarative config

Repo settings, branch protections, team memberships, and webhook configs defined in YAML. PRs for all changes.

Drift detection

Nightly reconciliation against desired state. Drift reported as issues, auto-fixed when safe, flagged for review otherwise.

Audit trail

Every policy change is a merged PR. Full git history of who changed what, when, and why.

Multi-org support

Runs across our org and client orgs (with their PATs). Same policy engine, different config repos.

04 · soc-stack

SOC Observability

Grafana + Loki + Alertmanager with 7 production alert rules and on-call routing.

alert rules7 active

log retention30d hot, 1yr cold

integrationspagerduty, slack, email

Dashboards

Pre-built Grafana dashboards for infrastructure health, deploy frequency, error budgets, and cost tracking.

Alert quality

Every alert is actionable, owned, and rare. Alert fatigue is treated as a bug, not a feature.

Log aggregation

Loki-based, label-indexed, fast. Structured logs with trace correlation. 30-day hot, 1-year cold storage.

On-call routing

Time-based and severity-based routing. Escalation policies. Quiet hours respected unless P1.

05 · runner-infra

Runner Infrastructure

Self-hosted, autoscaling CI runners on spot capacity with cron dispatch.

scaling0 to N in < 30s

cost modelspot instances

dispatchcron + event

Autoscaling

Scale-to-zero when idle, burst to capacity on push. No per-minute billing. Predictable monthly cost.

Security isolation

Ephemeral VMs, destroyed after each job. No shared state, no credential persistence, no lateral movement.

Cron dispatcher

Scheduled jobs (nightly builds, security scans, governance checks) managed as code, not UI clicks.

Audit trail

Full log of every job: who triggered, what ran, what artifacts produced. Retained 90 days.

06 · notify-bus

Notification Fabric

Unified alerting across email, Slack, PagerDuty, and webhooks. One config, all channels.

channels5 (email, slack, pd, sms, webhook)

routingseverity + team

dedupcontent-hash based

Multi-channel

One event, routed to the right channel based on severity and team ownership. No duplicate noise.

Deduplication

Content-hash dedup prevents alert storms. Grouped notifications for related events.

Templates

Channel-specific formatting. Slack gets rich blocks, email gets structured HTML, PagerDuty gets severity metadata.

Dead-letter queue

Failed deliveries retried with backoff. Persistent failures escalate to fallback channel. Nothing silently dropped.

Want this platform
in your environment?

Every tool above is deployable to client infrastructure. Same code, different config. We'll have it running in your AWS/Azure/GCP account within the first sprint.

Start a conversation →

book intro · 30mcal.com/caelicode

The toolkit behindthe team.

Six tools.One operating model.

Incident Automation

Synthetic monitoring

Incident lifecycle

Postmortems

Status page

Secret Scanning

Dual-engine approach

Incremental scanning

Custom rules

Routing & SLAs

Org Governance

Declarative config

Drift detection

Audit trail

Multi-org support

SOC Observability

Dashboards

Alert quality

Log aggregation

On-call routing

Runner Infrastructure

Autoscaling

Security isolation

Cron dispatcher

Audit trail

Notification Fabric

Multi-channel

Deduplication

Templates

Dead-letter queue

Want this platformin your environment?

The toolkit behind
the team.

Six tools.
One operating model.

Want this platform
in your environment?