Skip to content

Terraform Infrastructure

All infrastructure for this platform is defined as code in the infra/ directory using Terraform. A push to main that changes files under infra/ triggers the Terraform workflow (plan + manual approval + apply).

graph TD
    subgraph tf["Terraform (infra/)"]
        root["Root Module"]
        boot["Bootstrap"]
    end

    subgraph azure["Azure"]
        rg["Resource Group<br/>rg-kevinryan-io"]
        vnet["VNet 10.0.0.0/16"]
        node1["node1 — K3s Server<br/>Standard_B2s"]
        node2["node2 — K3s Agent<br/>Standard_B2s"]
        acr["Container Registry"]
        kv["Key Vault"]
        pg["PostgreSQL<br/>Flexible Server"]
        tfstate["State Storage<br/>rg-kevinryan-tfstate"]
    end

    subgraph ext["External"]
        cf["Cloudflare DNS<br/>5 zones"]
        gh["GitHub OIDC<br/>Federated Identity"]
    end

    root --> rg
    rg --> vnet
    vnet --> node1 & node2
    root --> acr & kv & pg
    root --> cf & gh
    boot --> tfstate
    node1 & node2 -.->|managed identity| acr & kv

The infrastructure is broken into seven reusable modules, orchestrated by a root module:

infra/
├── main.tf # Root orchestration, providers, backend
├── variables.tf # Root input variables
├── outputs.tf # Root outputs (IPs, ACR, OIDC values)
├── cloud-init-server.yaml # K3s server bootstrap template
├── cloud-init-agent.yaml # K3s agent bootstrap template
├── bootstrap/ # State backend bootstrap (run once)
│ └── main.tf
└── modules/
├── network/ # VNet, subnet, NSG, public IPs
├── compute/ # Linux VM with managed identity
├── registry/ # Azure Container Registry + AcrPull
├── keyvault/ # Key Vault with RBAC
├── postgresql/ # Flexible Server, private subnet, databases
├── cloudflare/ # DNS records + cache rules
└── github-oidc/ # Azure AD app + federated credentials
ProviderVersionPurpose
azurerm~> 4.0Azure resources (VMs, VNet, ACR, Key Vault, PostgreSQL)
azuread~> 3.0Azure AD application and service principal for OIDC
cloudflare~> 4.0DNS records and cache rules
random~> 3.0Password generation for K3s token, PostgreSQL, Umami, Grafana

Terraform state is stored remotely in Azure Blob Storage:

SettingValue
Resource grouprg-kevinryan-tfstate
Storage accountkrtfstate2026
Containertfstate
Keykevinryan-io.tfstate
VersioningEnabled

The bootstrap module (infra/bootstrap/) creates this storage account as a one-time setup:

resource "azurerm_resource_group" "tfstate" {
name = "rg-kevinryan-tfstate"
location = "northeurope"
}
resource "azurerm_storage_account" "tfstate" {
name = var.storage_account_name
resource_group_name = azurerm_resource_group.tfstate.name
location = azurerm_resource_group.tfstate.location
account_tier = "Standard"
account_replication_type = "LRS"
blob_properties {
versioning_enabled = true
}
}
resource "azurerm_storage_container" "tfstate" {
name = "tfstate"
storage_account_id = azurerm_storage_account.tfstate.id
container_access_type = "private"
}

Creates the foundational Azure networking layer.

ResourceNameDetails
Resource Grouprg-kevinryan-ioAll resources live here
Virtual Networkvnet-kevinryan-io10.0.0.0/16 address space
Subnetsnet-kevinryan-io10.0.1.0/24 for VMs
Public IP (node1)pip-kevinryan-ioStatic, Standard SKU, Zone 1
Public IP (node2)pip-kevinryan-node2Static, Standard SKU, Zone 1
NSGnsg-kevinryan-ioInbound rules below
PriorityNamePortSource
100AllowHTTP80Any
110AllowHTTPS443Any
120AllowSSH22Admin IP only

SSH is restricted to a single admin IP address (passed via var.admin_ip), while HTTP and HTTPS are open to the internet for Cloudflare-proxied traffic.

Creates Ubuntu Linux VMs with system-assigned managed identities. The module is called twice — once for the K3s server node and once for the agent node.

SettingValue
OSUbuntu 24.04 LTS (Canonical/ubuntu-24_04-lts/server)
SizeStandard_B2s (2 vCPUs, 4 GB RAM)
Disk30 GB Standard LRS
Zone1
IdentitySystem-assigned managed identity

Each VM receives a cloud-init template at creation time that bootstraps the K3s cluster. The custom_data lifecycle is set to ignore_changes so Terraform doesn’t recreate VMs when the cloud-init template changes.

The server node’s cloud-init performs:

  1. Install Azure CLI and authenticate with managed identity
  2. Retrieve the K3s token from Key Vault (with retry loop — up to 5 minutes)
  3. Install K3s in server mode
  4. Set up ACR credential refresh (systemd timer, every 2 hours)
  5. Install Flux CLI and bootstrap Flux CD from the GitHub repository
# Key excerpt from cloud-init-server.yaml
- |
export GITHUB_TOKEN="${github_token}"
flux bootstrap github \
--kubeconfig=/etc/rancher/k3s/k3s.yaml \
--owner=DevOpsKev \
--repository=kevin-ryan-platform \
--branch=main \
--path=k8s/flux-system \
--personal \
--token-auth \
--components=source-controller,kustomize-controller,helm-controller

The agent node’s cloud-init performs:

  1. Install Azure CLI and authenticate with managed identity
  2. Retrieve the K3s token from Key Vault (with retry loop)
  3. Install K3s in agent mode, joining the server via its private IP
  4. Set up ACR credential refresh (systemd timer, every 2 hours)

The agent node is configured with a taint and label for observability workloads:

Terminal window
INSTALL_K3S_EXEC="--node-taint observability=true:NoSchedule --node-label role=observability"

This ensures the observability stack (Grafana, Loki, VictoriaMetrics) is scheduled exclusively on node2, keeping site workloads isolated on node1.

Both nodes run a systemd timer that refreshes ACR authentication every 2 hours. ACR tokens expire after 3 hours, so the 2-hour interval ensures continuity. The script uses Azure CLI with the VM’s managed identity to obtain a fresh token and writes it to the K3s registries configuration:

mirrors:
"<acr>.azurecr.io":
endpoint:
- "https://<acr>.azurecr.io"
configs:
"<acr>.azurecr.io":
auth:
username: "00000000-0000-0000-0000-000000000000"
password: "<token>"

There is a potential circular dependency: the Key Vault module needs VM principal IDs (to grant RBAC), and the cloud-init templates need the Key Vault name (to retrieve secrets). This is resolved by passing the Key Vault name as a root-level variable (var.keyvault_name) rather than referencing the module output, breaking the Terraform dependency cycle.

Creates an Azure Container Registry and grants AcrPull to both VM managed identities:

SettingValue
SKUBasic
Admin enabledNo
Role assignmentsAcrPull for node1 and node2

GitHub Actions pushes images via the AcrPush role (granted by the github-oidc module). VMs pull images via AcrPull using their managed identities.

Creates an Azure Key Vault with RBAC authorization (no access policies):

SettingValue
SKUStandard
RBAC enabledYes
Purge protectionDisabled
PrincipalRolePurpose
node1 + node2 (managed identity)Key Vault Secrets UserRead secrets at runtime via External Secrets Operator
Terraform callerKey Vault Secrets OfficerCreate and manage secrets during terraform apply

Terraform generates and stores these secrets in Key Vault:

SecretSourceConsumer
k3s-tokenrandom_password (48 chars)Cloud-init on both nodes
pg-admin-passwordrandom_password (32 chars)PostgreSQL, External Secrets
pg-fqdnPostgreSQL module outputExternal Secrets
pg-admin-usernamePostgreSQL module outputExternal Secrets
umami-app-secretrandom_password (64 chars)External Secrets → Umami
grafana-admin-passwordrandom_password (32 chars)External Secrets → Grafana

Provisions an Azure Database for PostgreSQL Flexible Server on a private subnet:

SettingValue
SKUB_Standard_B1ms (burstable, 1 vCore, 2 GB RAM)
VersionPostgreSQL 16
Storage32 GB (auto-grow enabled)
Backup7-day retention, no geo-redundancy
NetworkPrivate subnet 10.0.2.0/28, private DNS zone
Public accessDisabled
ExtensionsPGCRYPTO (required by Umami)
DatabaseCharsetCollationConsumer
umami_dbUTF8en_US.utf8Umami analytics
grafana_dbUTF8en_US.utf8Grafana

The PostgreSQL server is deployed to a delegated subnet (snet-postgresql) with a private DNS zone (privatelink.postgres.database.azure.com) linked to the VNet. This means the database is only accessible from within the VNet — no public internet exposure.

Manages DNS records and cache rules for each domain. The module is called once per domain zone:

DomainZone VariableSubdomains
kevinryan.iocloudflare_zone_idbrand, docs
aiimmigrants.comcloudflare_zone_id_aiimmigrants
specmcp.aicloudflare_zone_id_specmcp
sddbook.comcloudflare_zone_id_sddbook
distributedequity.orgcloudflare_zone_id_distributedequity

Additionally, analytics and monitoring A records are created directly in the root module for analytics.kevinryan.io and monitoring.kevinryan.io.

For each domain, the module creates:

  • Root (@) — A record pointing to node1’s public IP, Cloudflare proxied
  • www — A record pointing to node1’s public IP, Cloudflare proxied
  • Subdomains — A records for each entry in the subdomains list

All records are proxied through Cloudflare (orange cloud), providing DDoS protection and CDN caching.

Each domain gets a Cloudflare ruleset in the http_request_cache_settings phase:

  • Edge TTL: 86,400 seconds (24 hours), overriding origin headers
  • Serve stale: Enabled — Cloudflare serves cached content if the origin is down

This is applied to all hostnames for the domain (root, www, and subdomains) via a dynamically constructed expression.

Configures passwordless authentication between GitHub Actions and Azure using OpenID Connect:

ResourceName
Applicationgithub-actions-kevinryan-io
Service PrincipalCreated from the application
CredentialSubject
main-branchrepo:DevOpsKev/kevin-ryan-platform:ref:refs/heads/main
production-envrepo:DevOpsKev/kevin-ryan-platform:environment:production

The main-branch credential allows deploy workflows to authenticate. The production-env credential allows the Terraform apply job (which runs in the production GitHub environment) to authenticate.

ScopeRolePurpose
ACRAcrPushPush Docker images from CI
Resource groupContributorManage resources during Terraform apply
State storage accountStorage Blob Data ContributorRead/write Terraform state
State resource groupReaderterraform init reads storage account properties

The root module exports values needed for GitHub Actions configuration:

OutputDescription
github_actions_client_idAZURE_CLIENT_ID secret for workflows
github_actions_tenant_idAZURE_TENANT_ID secret for workflows
github_actions_subscription_idAZURE_SUBSCRIPTION_ID secret for workflows
github_actions_acr_nameACR_NAME secret for workflows
github_actions_acr_login_serverACR_LOGIN_SERVER secret for workflows
node1_public_ipPublic IP of the K3s server node
node2_public_ipPublic IP of the K3s agent node
postgresql_fqdnFQDN of the PostgreSQL Flexible Server

The module dependency order during terraform apply:

graph TD
    network["network<br/>(VNet, IPs, NSG)"]
    node1["node1<br/>(K3s server)"]
    node2["node2<br/>(K3s agent)"]
    kv["keyvault"]
    pg["postgresql"]
    reg["registry<br/>(ACR)"]
    cf["cloudflare<br/>(×5 zones)"]
    oidc["github-oidc"]

    network --> node1 & node2
    network --> pg
    node1 & node2 --> kv
    node1 & node2 --> reg
    node1 --> node2
    network --> cf
    reg --> oidc

The node1 → node2 dependency exists because the agent’s cloud-init references module.node1.private_ip_address to join the K3s cluster.