Skip to content

Provenance: Spec 0005 — Observability Stack

Spec: .sdd/specification/spec-0005-observability-stack.md Executed: 2026-03-05 Agent: Cursor (claude-4.6-opus)

  1. Read ADR-006 (docs/adr/adr-006-observability-grafana-loki-promtail.md) for architectural requirements
  2. Read existing infrastructure state: infra/main.tf, infra/modules/postgresql/main.tf, infra/modules/postgresql/variables.tf
  3. Confirmed grafana_db already exists in PostgreSQL module’s default databases list — no DB changes needed
  4. Read existing Flux/K8s patterns: k8s/flux-system/kustomization.yaml, k8s/flux-system/umami-sync.yaml, k8s/umami/externalsecret.yaml, k8s/umami/ingress.yaml
  5. Researched current Helm chart repos — discovered Grafana chart migrated to grafana-community repo (Jan 2026)
  6. Saved spec to .sdd/specification/spec-0005-observability-stack.md
  7. Modified infra/main.tf: added random_password.grafana_admin_password, azurerm_key_vault_secret.grafana_admin_password, cloudflare_record.monitoring
  8. Created k8s/observability/ directory
  9. Created k8s/observability/namespace.yaml
  10. Created k8s/observability/helmrepository-grafana.yaml (Loki + Promtail charts)
  11. Created k8s/observability/helmrepository-grafana-community.yaml (Grafana chart)
  12. Created k8s/observability/helmrelease-loki.yaml (SingleBinary mode, filesystem storage, 744h retention, node2 scheduling)
  13. Created k8s/observability/helmrelease-promtail.yaml (DaemonSet, tolerates observability taint)
  14. Created k8s/observability/externalsecret.yaml (GF_DATABASE_* + GF_SECURITY_ADMIN_PASSWORD from Key Vault)
  15. Created k8s/observability/helmrelease-grafana.yaml (community chart, envFromSecret, Loki datasource pre-configured, node2 scheduling)
  16. Created k8s/observability/ingress.yaml (Traefik IngressRoute for monitoring.kevinryan.io)
  17. Created k8s/flux-system/observability-sync.yaml (dependsOn: external-secrets-store)
  18. Modified k8s/flux-system/kustomization.yaml: added observability-sync.yaml to resources
  19. Ran terraform fmt -check -recursive infra/ — passed
  20. Ran pnpm lint — passed (0 errors, 2 pre-existing warnings)
DecisionOptions ConsideredChosenRationale
Grafana chart repografana.github.io/helm-charts vs grafana-community.github.io/helm-chartsgrafana-communityOfficial Grafana chart migrated to community repo as of Jan 2026; old repo no longer receives updates
Two HelmRepository resourcesSingle repo vs separate repos for Grafana and Loki/PromtailSeparateLoki and Promtail remain on grafana.github.io/helm-charts; only the Grafana chart moved to community repo
Loki schema versionv12 vs v13v13 with TSDB storev13 is the current recommended schema for new deployments
Loki persistence size5Gi vs 10Gi vs 20Gi10Gi31-day retention with low-traffic portfolio sites; 10Gi provides headroom without waste
Disabled Loki sub-componentsSelective vs exhaustive zeroingExhaustive (all 12 component types set to replicas: 0)Avoids chart validation errors; explicit is better than implicit

No deviations from spec.

FileStatus
.sdd/specification/spec-0005-observability-stack.mdCreated
infra/main.tfModified (3 resources added)
k8s/observability/namespace.yamlCreated
k8s/observability/helmrepository-grafana.yamlCreated
k8s/observability/helmrepository-grafana-community.yamlCreated
k8s/observability/helmrelease-loki.yamlCreated
k8s/observability/helmrelease-promtail.yamlCreated
k8s/observability/externalsecret.yamlCreated
k8s/observability/helmrelease-grafana.yamlCreated
k8s/observability/ingress.yamlCreated
k8s/flux-system/observability-sync.yamlCreated
k8s/flux-system/kustomization.yamlModified
.sdd/provenance/spec-0005-observability-stack.provenance.mdCreated
CheckResult
Spec saved to .sdd/specification/Pass
infra/main.tf contains 3 new resourcesPass
k8s/observability/ has 8 filesPass
Loki: SingleBinary, replication_factor 1, filesystem, 744h, node2Pass
All non-SingleBinary replicas zeroedPass
Promtail: correct Loki URL, observability tolerationPass
Grafana: envFromSecret, Loki datasource, root_url, node2Pass
ExternalSecret: GF_DATABASE_* + GF_SECURITY_ADMIN_PASSWORDPass
IngressRoute: monitoring.kevinryan.io, websecure, tlsPass
Flux sync: dependsOn external-secrets-storePass
kustomization.yaml includes observability-sync.yamlPass
terraform fmt -check -recursive infra/Pass
pnpm lintPass (0 errors)
Provenance record existsPass