Infrastructure
-
Day 11: What If We Put Clocks in Space?
In 1977, three years before GPS launched, the engineers building the satellites had to make a decision.
The clocks they were about to put in orbit were going to run faster than the clocks on the ground. By about 38 microseconds per day.
That sounds like nothing, but over 24 hours of GPS operation, an uncorrected clock would put you 11 kilometers off your actual position.
They had two options:
- Adjust the time signal on the ground, applying the correction as the data came back down.
- Pre-tune the clocks on the satellites to run slow by exactly the right amount, so that by the time relativity sped them back up, they’d tick at the right rate.
GPS chose option two.
They built the clocks to run at 10.22999999543 MHz instead of the nominal 10.23 MHz, so that orbital relativity speeds them up to ~10.23 MHz by the time the signal hits your phone.
The correction is baked in.
That’s what putting clocks in space looks like. One decision, and now everyone on Earth gets both navigation and time from the same signal.
This post is about the impact of that decision.
Why Clocks in Orbit Run Faster
Two relativistic effects act on a GPS satellite clock, and they push in opposite directions.
Special relativity slows the satellite clock down because it’s moving fast. General relativity speeds it up because it sits in weaker gravity than the ground. Gravity wins. Net result: the satellite clock gains about 38 microseconds per day.
Sounds like nothing. But uncorrected, that 38 microseconds drifts your GPS position by 11 km in 24 hours. Within a day of launch, GPS would be useless for anything more precise than “are you in the right country.”
This was known before launch. It was tested. It works.
What GPS Time Actually Is
GPS time is its own scale, started at midnight on January 6, 1980, and ticking continuously since. No leap seconds. No time zones.
The relationship to the other scales is fixed and simple:
GPS = TAI − 19 seconds (constant since launch) GPS = UTC + 18 seconds (today)GPS−TAI never changes. GPS−UTC grows every time UTC gets a leap second, and freezes after the 2035 leap-second abolition.
It is, in every meaningful sense, the most accurate clock in your daily life. And you’ve never seen it.
What It’s Used For
GPS time runs almost everything that needs precise timing in modern civilization, but it’s invisible because nobody consumes it directly.
- Finance. US and EU regulators (MiFID II, SEC) require trading firms to timestamp orders to microsecond precision. GPS-disciplined oscillators are how.
- Telecom. Cellular base stations need their carrier frequencies aligned across the network. GPS clocks them. Without GPS, your phone would struggle to hand off between towers.
- Power grid. Phasor Measurement Units monitor the AC waveform across the entire grid, synchronized to GPS. This is how grid operators detect instabilities before they cascade into blackouts.
- Datacenters. Stratum-1 NTP servers are typically GPS-disciplined. Every clock you’ve ever checked on a computer ultimately traces back, through several layers of network sync, to a GPS receiver in someone’s rack.
- Aviation, surveying, autonomous vehicles, drones, scientific instruments, particle physics. Anything built since 1995 that needs accurate timing or positioning, which is essentially everything.
The civilian world runs on GPS time. It just doesn’t admit it.
What It Didn’t Solve
Putting clocks in space solved navigation.
It did not solve timekeeping.
Your watch is still on local time. Your calendar uses civic dates with leap seconds buried in the UTC. You’re reading a clock face anchored to a Roman calendar, a Babylonian 24-hour day, and an Earth rotation that nobody can predict.
GPS time is great if you are a satellite, a financial trader, a power-grid engineer, a fighter jet, or a cell tower.
It is not great if you are trying to know what time to pick up your kid from school.
For that, you still need wall time, which still needs UTC, which still needs leap seconds, which still needs Earth’s wobbling rotation.
We built absurdly precise atomic clocks. We launched them into orbit. We baked relativity corrections into the silicon. We covered the planet in time signals accurate to nanoseconds.
And your meeting is still at 3 PM on Tuesday.
GPS quietly handles the part it needs to handle. But all of this assumes you’re on Earth.
Where This Goes
Earth orbit needs relativity corrections. The Moon needs more. Mars needs different ones still.
The further you get from Earth, the more “GPS-style time” stops being a solution.
Tomorrow: if an hour is an Earth measurement, so how do you tell time on a planet that doesn’t have them?
Sources
- Error analysis for the Global Positioning System — Wikipedia
- GPS time — Wikipedia
- Schriever Space Force Base — Wikipedia
- Phasor measurement unit — Wikipedia
I’d appreciate a follow. You can subscribe with your email below. The emails go out once a week, or you can find me on Mastodon at @[email protected].
/ Science / Infrastructure / 30daysoftime / Timekeeping / Gps / Relativity
-
Running Terraform in Your Existing CI Pipeline
The previous post made the case that HCP Terraform’s per-resource pricing model has gotten structurally hostile to modern infrastructure patterns. (The earlier posts in this series argued that OpenTofu is the no-regrets default for new infrastructure, and walked through when to skip Terraform entirely in favor of cloud-native tooling.) The natural follow-up: if you don’t want to pay the commercial orchestration tax, can you run Terraform or OpenTofu properly inside your existing CI/CD? The answer is yes, but the gap between “it works” and “it works well” requires some deliberate architecture. This post is about how to close that gap.
There are three pieces: where the state lives, how the pipeline authenticates to your cloud, and what handles the orchestration concerns (locking, PR commentary, drift detection) that TACOs sell as their core value. Each one has a sensible 2026 answer that doesn’t involve paying anyone.
State Management in GitLab
If you’re on GitLab, the entire state-management problem is solved natively. GitLab ships an HTTP backend for Terraform and OpenTofu state on every tier including Free. You don’t need to provision an S3 bucket. You don’t need a DynamoDB lock table. You don’t need to figure out KMS. The state file is encrypted in transit and at rest, locking is handled by GitLab’s project-scoped role-based access control, and there’s a native UI under Operate > Terraform states that shows you version history and lets you roll back if something corrupts.
The pipeline pattern uses a backend block like this:
terraform { backend "http" {} }Combined with the
gitlab-tofu(orgitlab-terraform) CLI wrapper in your.gitlab-ci.yml, which dynamically configures the HTTP backend at runtime using the per-job${CI_JOB_TOKEN}. The wrapper avoids passing backend credentials via-backend-configarguments (which cache in pipeline logs) and handles authentication automatically.The RBAC story is also worth pointing out, because it’s exactly what TACOs charge thousands of dollars to replicate: the GitLab project’s role model becomes the IaC permissions model. Developers can read state and run
tofu plan -lock=false. Maintainers and Owners can lock state and runtofu apply. The audit log is the GitLab activity feed. No additional configuration, no additional vendor.For GitLab shops, this is the single highest-leverage decision in the entire IaC stack: stop paying for state management when your VCS gives it to you for free.
Secretless Authentication on GitHub Actions
On GitHub Actions, the equivalent problem is authentication. Historically, every Terraform-on-Actions tutorial told you to put a long-lived AWS access key in GitHub Secrets. That’s the worst possible pattern. A compromised repository, a malicious third-party action, or a leaked log line gives the attacker permanent, unscoped access to your cloud.
The 2026 answer is OpenID Connect with cloud-side trust policies. The pipeline gets ephemeral, short-lived credentials per job, scoped to the specific repository and branch that initiated the run. Nothing persists.
For AWS: configure GitHub’s OIDC provider (
token.actions.githubusercontent.com) as an identity provider in IAM. Create an IAM role with a trust policy that conditionally allows assumption based on JWT claims likesub(subject) andaud(audience). The workflow usesaws-actions/configure-aws-credentialsto exchange a GitHub-issued JWT for temporary AWS credentials viaAssumeRoleWithWebIdentity. The trust policy can be scoped to a specific repository, a specific branch (mainonly), or even a specific environment (production).For GCP: the equivalent is Workload Identity Federation. You create a Workload Identity Pool that trusts GitHub’s OIDC provider, configure attribute mapping that validates the token claims (e.g., requiring
assertion.repository == "company/infra-prod"), and grant the pool’s principal the ability to impersonate a specific GCP service account. The officialgoogle-github-actions/authaction handles the token exchange.Both patterns produce credentials that expire when the job ends, can’t be exfiltrated to long-term storage, and leave a clean audit trail in your cloud’s IAM logs. There is no good reason to use long-lived cloud credentials in CI in 2026.
What CI Doesn’t Give You for Free
Native CI/CD solves the cost problem. It does not, by itself, solve every operational problem that commercial TACOs address. There are three real gaps worth knowing about:
State locking and race conditions. Standard CI/CD systems are designed for concurrent runs because that’s what application code wants. Infrastructure code wants the opposite. If two PRs merge at the same time and both trigger
tofu apply, you have two concurrent processes racing to mutate the same state file. With GitLab’s HTTP backend or an external lock backend like DynamoDB, the lock will prevent corruption but one job will fail with a confusing error. Without it, you get state corruption. You need some queuing logic, either custom or via an orchestrator.PR plan commentary. TACOs post the output of
terraform plandirectly into the PR so reviewers can see what’s about to change before merging. In raw CI/CD this requires a third-party action (terraform-plan-pr-commenterand similar), parsing of the CLI output, handling of PR comment character limits, and securely passing the binary plan file as a workflow artifact from the plan stage to the apply stage. None of this is hard, but it’s a real chunk of YAML you have to maintain.Cost estimation. TACOs include built-in cost estimation on every plan. Adding this to your own pipeline means picking up a third-party FinOps or IaC cost-analysis tool (there are several worth comparing), running it against your plan output, parsing the JSON, comparing against budget thresholds, and posting deltas into PRs. None of that is hard, but it’s another bit of integration to own.
You can build all of this yourself. Plenty of teams do. The question is whether maintaining the bash and YAML is cheaper than using an open-source orchestrator designed for exactly this problem.
Open-Source Tools to Layer In
None of these are drop-in replacements for HCP Terraform or Spacelift. They solve specific problems CI/CD doesn’t handle on its own, and you compose them based on what’s actually missing from your setup.
Tool What It Solves Best For Atlantis PR-based workflow automation, plan/apply via PR comments, PR-level locking Teams that want TACO-style PR workflow but on their own server Digger Same PR workflow + locking, but the IaC actually runs inside your existing CI runners Teams with secretless OIDC pipelines who don’t want to maintain a separate server Terramate Multi-stack monorepo orchestration, git-based change detection, parallel execution Teams whose Terraform has grown into hundreds of stacks Atlantis is the original PR-automation tool, accepted into the CNCF Sandbox in June 2024. It deploys as a Golang binary or container, listens for VCS webhooks, and runs Terraform on its own server. The architecture is showing its age. It’s stateful, single-threaded, granting it persistent privileged cloud access creates a high-value target, and the maintenance velocity has slowed. If you’re already running it and it works, fine. For new setups, the case for Digger is usually stronger.
Digger is a thinner orchestration layer. It coordinates Terraform jobs but runs them inside your existing GitHub Actions or GitLab runners, using OIDC for cloud authentication. The orchestrator backend itself never sees state, plan output, or cloud credentials. This is the right pattern if you’ve already built secretless OIDC pipelines and want PR-workflow automation without introducing another long-lived privileged component.
Terramate solves a different problem: scaling Terraform across many stacks in a monorepo. It parses your Git history to determine which stacks changed, then runs
planandapplyonly on those, in parallel. For a repo with 200 stacks and a PR that touches one, you skip the 199 unnecessary plans. It also has a code-generation system that reduces HCL boilerplate. Terramate Cloud adds dashboards and drift detection without requiring access to cloud credentials. If your IaC repo has gotten unwieldy, Terramate is the tool for it. It’s a complement to Atlantis or Digger, not a substitute.The Recommendation
The full picture for escaping commercial TACOs in 2026:
- State: GitLab’s native HTTP backend if you’re on GitLab. S3 + DynamoDB (or OpenTofu state encryption + S3) if you’re on GitHub.
- Auth: OIDC for AWS, Workload Identity Federation for GCP. Never long-lived secrets.
- PR workflow: Digger if you want PR automation that runs inside your existing CI. Atlantis if you’re already running it. Skip this layer entirely if your team is small enough that PRs serialize naturally.
- Stack management: Terramate if you have a large monorepo. Otherwise, not needed.
- Cost estimation: Pick a third-party FinOps or IaC cost-analysis tool and wire it into your plan stage.
The total monetary cost of this stack is the price of your existing CI/CD minutes, which you’re already paying. The total time cost is on the order of one to two weeks of platform-engineering time to set up properly, plus ongoing maintenance proportional to how much you customize.
For most organizations under 300 engineers, that’s cheaper than HCP Terraform Standard or Premium. For larger organizations, the calculus depends on how much custom platform work you’re willing to absorb versus how much you want a vendor to handle.
This wraps the series. Four posts in: OpenTofu as the no-regrets default engine, the scenarios where cloud-native tools beat Terraform entirely, the HCP pricing model that’s pushing teams to find alternatives, and now the CI-native path that lets you skip commercial orchestration. The throughline is the same as every post in this blog about platform engineering: there isn’t a single open-source tool that drops in for HCP Terraform or Spacelift. You’re assembling a stack from focused pieces (state backend + auth + maybe PR automation + maybe stack management), accepting some operational tax in exchange for not paying the SaaS premium. For most teams under 300 engineers, that tradeoff is worth it.
Sources
- GitLab-managed Terraform/OpenTofu state — GitLab Docs
- How to Manage Terraform State with GitLab — Spacelift
- Using Terraform to connect GitHub Actions and AWS with OIDC — Thiago Salvatore
- Deploy Terraform resources to AWS using GitHub Actions via OIDC
- Configure Workload Identity Federation with deployment pipelines — GCP Docs
- Terraform Deployment to GCP Using GitHub Actions and Workload Identity Federation
- Atlantis vs. Terraform Cloud / Terraform Enterprise — Spacelift
- Digger and Atlantis: key differences
- Terramate: Turn Your IaC into a Lightning-Fast Platform
- How to Implement Cost Checks in Terraform CI/CD Pipelines — OneUptime
- Terraform Plan PR Commenter (GitHub Action)
I’d appreciate a follow. You can subscribe with your email below. The emails go out once a week, or you can find me on Mastodon at @[email protected].
/ DevOps / Github / Infrastructure / Opentofu / Terraform / Cicd / Gitlab
-
HCP Terraform's Per-Resource Pricing Is a Trap
The first post in this series argued OpenTofu is the no-regrets default for new infrastructure. The previous post mapped out when to skip cloud-agnostic IaC entirely. This one is about what happens to organizations that picked Terraform years ago, built their orchestration around HCP Terraform (formerly Terraform Cloud), and are now opening renewal quotes that have doubled or tripled year-over-year.
The short version: HashiCorp’s 2024 pivot to Resource Under Management (RUM) billing penalizes the architectural patterns the DevOps community spent a decade adopting. Modular code, ephemeral environments, and granular resources are all things you were supposed to do with Terraform. They now cost real money under the new pricing model. And the legacy free tier that grandfathered teams into a more sustainable cost structure hit end-of-life on March 31, 2026.
If you’re still on HCP Terraform in 2026, you need to understand the math.
How the New Pricing Works
The 2024 RUM model bills based on the peak number of resources tracked in your
terraform.tfstatefiles, measured hourly. The Free tier covers up to 500 resources with a single concurrent run. Above that, you’re on Pay-As-You-Go tiers:Tier Per-resource cost Concurrency What you get Free $0 (first 500) 1 Basic VCS, remote state Essentials ~$0.10/month 1 Basic provisioning, no SSO Standard ~$0.47/month 3 Up to 5 policy checks, cost estimation, limited RBAC Premium ~$0.99/month 10 Full governance, unlimited policies, SSO, audit logs On paper, $0.47 per resource per month looks negligible. The math goes sideways quickly because of three things.
Why “Resources” Is a Footgun
1. Granularity inflation. A single logical Terraform module produces dozens of underlying resources. An AWS VPC module isn’t one billable resource. It’s the VPC plus every subnet, every route table, every route table association, every IAM policy attachment, every security group rule, every DNS record. A widely-shared Reddit post by user
notoriousbpgdescribes a team whose HCP Terraform bill was about to jump from $0 to over $15,000 a year, because 80% of the resources under management were GraphQL operation mappings to data sources, while the actual AWS infrastructure they cared about cost only $8,000. They were paying more for orchestration than for the infrastructure being orchestrated.2. Idle workspaces. RUM billing doesn’t distinguish between active and inactive infrastructure. The proof-of-concept workspace someone spun up last quarter and never destroyed is still on your bill. The staging environment that was deprecated in favor of ephemeral environments is still on your bill. Industry telemetry suggests 30–40% of an average organization’s RUM cost is for completely idle infrastructure nobody has bothered to
terraform destroy.3. Hourly peak billing on ephemeral resources. HCP Terraform bills based on peak hourly resource count. If your integration test pipeline spins up infrastructure that exists for five minutes and is then torn down, you’re billed as if it existed for the full hour. This is a direct tax on the modern GitOps workflow patterns Terraform itself spent years promoting. The more ephemeral environments you use, the more punitive the billing becomes.
The compounding effect is severe. Another account describes cloning a 600-resource production workspace to create a pre-production environment. The resource count doubles to 1,200. The annual cost goes from ~$122 to ~$858, a 7x increase for what’s architecturally a trivial change. Multiply that across every environment, every test fixture, every modular abstraction, and the renewal quote stops being theoretical.
The Alternative TACOs
Once organizations work through the RUM math and realize the bill is structurally unsustainable, the obvious move is to look at alternative orchestration platforms. The three serious contenders, with very different pricing models:
Platform Pricing Model Entry / Mid-Tier Cost What It Does Differently Spacelift Resources + runs + seats $1,500–$3,500/mo Multi-tool (Pulumi, K8s manifests, Terragrunt). OPA policies. Custom runners. Cross-stack dependencies. env0 Per-user ~$50/user/mo Predictable user-based pricing. Strong TTL/ephemeral environment story. Scalr Per-user ~$50/user/mo Familiar Terraform Cloud UI replacement. Lower entry price than Spacelift. ControlMonkey Fixed plan (users + assets) $800/mo (Startup: 10 users, 5k assets, 500 deploys) One-click Terraform import, automatic drift remediation, daily cloud-config backups, built-in compliance. Spacelift is the choice for complex platform engineering teams. It supports Terraform, OpenTofu, Pulumi, Terragrunt, and Kubernetes manifests in one platform, handles cross-stack dependencies, and bakes OPA policy enforcement into the runtime. The catch is that its pricing still factors in managed resources, so the bill scales with infrastructure size, just less aggressively than HCP.
env0 and Scalr both flipped to user-based pricing specifically as a response to RUM. A 15-engineer team managing 3,000 resources pays roughly the same on env0 as a 15-engineer team managing 500. The price is bounded by headcount, not infrastructure complexity. This is the right model for teams whose resource counts have ballooned because they followed the “do everything as code” advice and now have hundreds of granular Terraform-managed entities they don’t want to pay per-unit fees on.
When to Pay for Any Commercial TACO
The harder question is whether the commercial orchestration layer is worth its multi-thousand-dollar monthly bill at all. The features TACOs sell (state locking, PR-level plan output, policy enforcement, drift detection, audit logging) are all things you can build into your own CI/CD pipeline. The question is whether building and maintaining that pipeline is cheaper than paying the SaaS fee.
For most teams under ~50 engineers, the answer is no. The SaaS fee buys polish and convenience, but the underlying capabilities are available in GitLab’s native state management or in GitHub Actions with the right open-source orchestrator. For larger teams, the calculus shifts: the cost of a dedicated platform engineer maintaining a custom CI/CD pipeline starts to approach the cost of a commercial license, and the operational predictability of a managed platform becomes valuable.
But the days of HCP Terraform being the obvious default for everyone above the free tier are over. The RUM model made the math too punishing for too many real-world architectures.
The next and final post in this series gets into the actual mechanics of running Terraform/OpenTofu inside your existing CI/CD: GitLab’s native state backend, GitHub Actions with OIDC/Workload Identity Federation for secretless deploys, and the open-source orchestrators (Atlantis, Digger, Terramate) that close the gap between raw YAML and a real platform.
Sources
- Terraform Cloud / Enterprise Pricing — Tiers Overview 2026 — Spacelift
- Terraform Cloud Pricing Guide: Tiers, Costs, and Optimization Tips — ControlMonkey
- 10 Best Terraform Cloud Alternatives & Competitors In 2026 — ControlMonkey
- Continuing HCP Terraform’s enhanced free tier experience — HashiCorp
- Terraform Cloud Pricing Explained: Resource-Based Guide (2026) — Firefly
- Spacelift Software Pricing & Plans 2026 — Vendr
- Terraform Cloud Pricing: A Complete Guide (2026) — env0
I’d appreciate a follow. You can subscribe with your email below. The emails go out once a week, or you can find me on Mastodon at @[email protected].
/ DevOps / Infrastructure / Terraform / Pricing / Hashicorp
-
When You Should Skip Terraform Entirely
The last post in this series made the case that OpenTofu is the no-regrets default for new infrastructure projects. That’s true for the broad case of cloud-agnostic or multi-cloud setups where HCL parity, provider breadth, and a Linux Foundation governance model matter.
It’s also not the whole story. There are at least three common scenarios where the right answer in 2026 isn’t Terraform or OpenTofu. It’s the cloud-native tool the hyperscaler ships with its platform. AWS has CloudFormation and the CDK. Azure has Bicep. GCP has Config Connector. Each one is technically superior to Terraform inside its own ecosystem, and each one removes a category of operational pain that Terraform inflicts.
If you reflexively reach for Terraform every time, you’re probably overpaying in complexity for a multi-cloud option you’ll never exercise.
The Small AWS-Native Startup: Use CDK
If your engineering team is small, you’re shipping a SaaS product, and you’re 100% on AWS, you should probably ignore Terraform entirely. The right tool is the AWS Cloud Development Kit, layered on top of CloudFormation.
The fundamental win is that CloudFormation eliminates state management. There is no
terraform.tfstatefile. No S3 bucket to provision. No DynamoDB lock table. No state-encryption configuration to figure out. The state lives in the AWS control plane, AWS manages locking and consistency, and your CI pipeline doesn’t need to know about any of that. For a small team, that’s a meaningful operational tax you don’t pay.The CDK is the part that makes this pleasant. It lets you define infrastructure in TypeScript, Python, Java, C#, or Go; so the languages your application engineers already know. There’s no HCL learning curve, no Sentinel policy DSL, no jq-in-bash to manipulate plan output. You write code, the CDK synthesizes CloudFormation templates, CloudFormation provisions the infrastructure.
The objection people raise is “what if you go multi-cloud later?” In practice, most SaaS startups don’t. They get acquired, they pivot, or they grow large enough to have a dedicated platform team that does the migration deliberately. Optimizing for a hypothetical multi-cloud future that 90% of teams will never need is the textbook definition of premature abstraction. If you’re an AWS-native startup with fewer than 50 engineers and no concrete plans to leave AWS, the cost of running Terraform-as-multi-cloud-insurance is higher than the cost of a future migration that probably won’t happen.
The Azure Enterprise: Bicep, Unless You Need More
For organizations heavily invested in Microsoft’s stack, so Azure for compute, Azure DevOps for CI/CD and EntraID for identity; Bicep is the technically correct choice for most workloads.
Bicep is Azure’s domain-specific language for infrastructure, designed as a replacement for the verbose ARM JSON templates everyone hated. Like CloudFormation, it’s stateless. You submit a desired-state Bicep file to the ARM control plane and ARM reconciles. No state file, no remote backend, no risk of corruption. Authentication is whatever RBAC permissions the deploying identity already has, with no provider credential configuration required.
Bicep also gets day-zero feature support for new Azure capabilities. When Microsoft ships a new service, you can use it in Bicep the same day. The Terraform AzureRM provider has historically lagged by weeks or months, occasionally longer.
The catch is scope. Bicep manages Azure. That’s the entire surface area. Larger organizations tend to need management of things outside Azure too: GitHub repositories and branch protection, EntraID groups, Datadog monitors, PagerDuty escalation policies, whatever SaaS services your platform touches. Bicep has no answer for any of that.
That leaves two paths. The first is a hybrid: Bicep for Azure, separate tools for everything else, accept the cost of context-switching and the inability to express cross-domain dependencies in a single deployment. The second is Terraform or OpenTofu for everything, accepting the heavier operational tax of stateful IaC, in exchange for one tool that can do all of it. Neither is wrong; they’re different tradeoffs against the same constraint.
The decision rule: if you’re managing only Azure resources, use Bicep. If you have cross-domain provisioning needs and you’d rather not maintain two parallel IaC stacks, Terraform (or OpenTofu) earns its keep.
The GCP/Kubernetes Shop: Hybrid by Design
For organizations heavily committed to Google Cloud and running most workloads on GKE, the right architecture isn’t either/or. It’s a hybrid that uses Terraform for the foundation and Config Connector for the application layer.
Config Connector is a GCP-shipped Kubernetes add-on. It lets you manage GCP resources — Cloud SQL instances, Pub/Sub topics, storage buckets, service accounts — as standard Kubernetes Custom Resources. You write a YAML manifest, you
kubectl apply, and a controller in the cluster reconciles the real-world GCP resource to match.The differentiator is continuous reconciliation. Terraform is episodic: it checks state at
planandapplytime, and the rest of the time your infrastructure is unmonitored. If someone clicks around in the GCP console and manually changes a setting, Terraform won’t notice until the next pipeline run. Config Connector runs a controller loop that polls continuously. Manual drift gets reverted in real time.The right architectural boundary:
- Platform layer (Terraform/OpenTofu): VPCs, subnets, foundational IAM, the GKE clusters themselves. These are slow-moving, security-critical, and you want a deliberate pipeline approval flow for them.
- Application layer (Config Connector): Application-specific buckets, databases, service accounts, Pub/Sub topics. Application teams own these via the same YAML manifests they use for their pods, with the same GitOps workflow they already understand.
This pattern gives platform teams strict guardrails on the foundation while letting application developers self-serve the resources their services need, without filing a Terraform PR every time they want a new bucket.
The Decision Rule
The honest version of all of this: Terraform/OpenTofu is the right answer when you need cross-domain or cross-cloud governance. For everything else, the cloud-native tool is usually less work, more current with the platform, and avoids the operational tax of state management.
A reasonable decision tree:
- Single-cloud, small team, AWS: AWS CDK + CloudFormation.
- Single-cloud, single-domain, Azure: Bicep.
- GCP with heavy Kubernetes use: Hybrid — Terraform/OpenTofu for foundation, Config Connector for application resources.
- Multi-cloud, or cross-domain platform engineering (GitHub + cloud + identity + monitoring): OpenTofu.
The mistake I think most teams are making is to default to Terraform because it’s the tool the senior engineer learned in their last job. The platform-engineering pitch … “we’ll standardize on Terraform so we can move to any cloud later” is correct in theory but almost never exercised in practice. If your team isn’t using the cross-cloud capability today, you’re paying for an insurance policy you’ll never collect on.
Next post in this series digs into the other side of that calculation: what HCP Terraform actually costs in 2026, and why even teams that need cloud-agnostic IaC are looking for the exit from the commercial orchestration platforms.
Sources
- Bicep Vs Terraform: Choosing The Best IaC Tool For Azure — Synextra
- Terraform vs Bicep vs ARM Templates 2026 Compared — Exodata
- Comparing Terraform and Bicep — Microsoft Learn
- Terraform vs Bicep vs ARM: Lessons from the Trenches — Vaibhav Gujral
- How to Use the GCP Config Connector with Terraform — OneUptime
- How Config Connector compares for infrastructure management — Google Cloud Blog
- Are Terraform’s days numbered? — Alistair Grew
I’d appreciate a follow. You can subscribe with your email below. The emails go out once a week, or you can find me on Mastodon at @[email protected].
-
OpenTofu Is the No-Regrets Default for 2026 Infrastructure
Hashicorp’s adoption of the Business Source License in late 2023 was a defensive business decision. Companies like Spacelift, env0, and Scalr were building paid commercial platforms on top of MPL-licensed Terraform, capturing significant revenue from an ecosystem Hashicorp was largely funding. The same pattern played out with Redis Labs facing AWS ElastiCache, Elastic facing Amazon OpenSearch, and MongoDB facing the cloud hyperscalers before its move to the SSPL. The BSL is a rational corporate play: keep the core open enough to preserve mindshare, restrict the terms enough that pure resellers can’t extract value without engaging commercially. From the standpoint of a publicly traded company with a board to answer to, it made sense.
But it also broke a tacit contract. Hashicorp had spent a decade positioning Terraform as infrastructure’s
git. Neutral, ubiquitous, irreplaceable. A license that lets a single vendor change the terms when the shareholder math demands it is not neutral, and a large portion of the community decided they weren’t comfortable with that risk. The Linux Foundation forked the last MPL-licensed Terraform release and shipped it as OpenTofu. Two years later, OpenTofu has crossed 10 million downloads, holds HCL parity with Terraform, supports the same provider ecosystem (AWS, Azure, GCP, Kubernetes, everything), and ships features Terraform itself doesn’t have.For greenfield infrastructure in 2026, OpenTofu is the no-regrets default. For existing Terraform codebases, the migration is mostly a binary swap. The reasons to still pay for Terraform are mostly inertia. Let me explain.
The Migration Is Mostly Free
The technical case for “stay on Terraform” essentially doesn’t exist. OpenTofu reads the same HCL. It produces the same execution plans. It maintains the same state file format. It interfaces with the same providers, including the ones Hashicorp wrote, because the provider API was never the part Hashicorp tried to lock down.
To migrate a non-trivial Terraform codebase to OpenTofu, you do roughly this:
- Swap
terraformfortofuin your CI binary install step. - Update any pipeline scripts that hardcoded the binary name.
- Run
tofu init -migrate-stateonce. - Run
tofu planand confirm it produces an empty diff against the existing state.
There are edge cases, like modules pinned to specific Terraform-version constraints or providers that gated features on the Hashicorp-only registry. But for the vast majority of codebases, the migration is a one-afternoon job, including the PR review and the team announcement.
What you get in exchange is governance under the Linux Foundation, an active multi-vendor contributor base, no future license surprises, and a really nice to have feature not in Terraform currently: native state encryption.
State Encryption Is the Real Reason
Terraform state files have a property nobody enjoys discussing. They contain everything sensitive about your infrastructure, and they store it in plaintext.
That’s not a misconfiguration. That’s the design. The
terraform.tfstateJSON file holds resource IDs, ARNs, network topology, credentials surfaced as outputs, RDS connection strings, and any sensitive value a module decided to track. When you use S3 or Azure Blob as a remote backend, you get encryption at rest, meaning the cloud provider’s storage layer is encrypted. The state itself, the thing your CI pipeline downloads and uploads on every run, is plaintext JSON. Anyone with read access to the bucket (your CI runner, your laptop, anything assuming the role) gets the cleartext.OpenTofu solves this with native, client-side state encryption introduced as a first-class feature. The state is encrypted by the engine before it leaves the machine. The remote backend never sees plaintext at all. The configuration looks like this:
terraform { encryption { key_provider "aws_kms" "primary" { kms_key_id = "arn:aws:kms:us-east-1:..." region = "us-east-1" key_spec = "AES_256" } method "aes_gcm" "primary" { keys = key_provider.aws_kms.primary } state { method = method.aes_gcm.primary } plan { method = method.aes_gcm.primary } } }Three pieces. A key provider (AWS KMS, GCP KMS, OpenBao, or a local passphrase via pbkdf2), an encryption method (AES-GCM is the standard pick), and explicit targets for state, plan, or both.
The migration path from existing plaintext state requires a fallback block. OpenTofu refuses to read plaintext once encryption is enabled, which is the right default, but it means you need to tell it “this one time, read the legacy state and re-encrypt it.” After one successful apply, you remove the fallback and you’re done.
Terraform doesn’t have this. Hashicorp’s official answer is still “use a backend that encrypts at rest and audit your IAM policies carefully.” Which is fine, until your CI logs the state diff into a third-party observability tool, or someone runs
terraform showover a Slack screenshare, or an attacker gets a transient role to your backend bucket. The threat model OpenTofu’s encryption closes is the threat model that matters.The AI Wrinkle
There’s a meta-argument unfolding alongside all of this: AI is making the choice of execution engine less important.
Industry telemetry says 71% of cloud teams have seen an exponential increase in IaC volume from generative AI. The thing AI is generating, in most cases, is HCL, which is the lingua franca for both Terraform and OpenTofu. As the volume of AI-authored infrastructure grows, the role of HCL shifts from “the language engineers write” toward “the intermediate representation an agent emits.” Manual HCL authoring is on track to become a niche skill in the same way hand-tuning compiler output is a niche skill.
In that world, the execution engine is plumbing. The valuable layer is everything around it: state management, drift detection, policy enforcement, cost guardrails, audit trails. Which is exactly the layer where vendor lock-in does the most damage and where open governance matters most. The AI argument doesn’t undercut the OpenTofu case. It reinforces it.
What To Do
If you’re starting a new infrastructure project, use OpenTofu. There is no good reason to start a 2026 greenfield project on a single-vendor BSL-licensed engine when the Linux Foundation-governed open-source alternative is right there, with full HCL parity, the same provider ecosystem, and features Terraform doesn’t have.
If you have an existing Terraform codebase, schedule the migration. It’s a one-afternoon job per repo. Get state encryption while you’re at it.
If you’re heavily integrated with HCP Terraform, this is the harder case. The migration off the proprietary HCP features (Sentinel policies, the registry, the integrated dashboards) is real work. But it’s also the case where you have the most to lose. HCP Terraform’s pricing model has gotten aggressively worse, and OpenTofu’s existence means you have actual leverage in the next renewal conversation. The next post in this series digs into exactly what HCP pricing looks like in 2026 and why so many organizations are getting six-figure renewal quotes for infrastructure they were paying $20K for two years ago.
This is the first of a four-part series on the 2026 IaC landscape. Up next: cloud-native vs cloud-agnostic tooling, and when to use AWS CDK, Bicep, or Config Connector instead of Terraform/OpenTofu at all.
Sources
- 2026 IaC Predictions: What Cloud Leaders Must Prepare For ControlMonkey
- Terraform vs OpenTofu in 2026: Should You Stay or Switch?
- Terraform or OpenTofu in 2026? Here’s What I Actually Think Jae Wook Kim
- OpenTofu vs Terraform in 2026: Is the Fork Finally Worth It? Mechcloud Academy
- OpenTofu vs. Terraform: A Practical Guide for Enterprise Infrastructure Teams env0
- State and Plan Encryption OpenTofu docs
- How to Use OpenTofu State Encryption OneUptime
- State Encryption with OpenTofu Ned in the Cloud
I’d appreciate a follow. You can subscribe with your email below. The emails go out once a week, or you can find me on Mastodon at @[email protected].
/ DevOps / Infrastructure / Opentofu / Terraform
- Swap
-
The Data Lakehouse Won. Now Pick a Table Format.
If you’ve been ignoring the data infrastructure conversation for the last few years, here’s where we landed in 2026: the data lakehouse won. The data warehouse vendors will fight about it for another decade, but the architectural argument is over.
Let me back up.
The Quick History
At the bottom of every modern data stack is a cloud storage bucket. S3, Azure Blob, GCS. Pick your hyperscaler. A bucket is dumb on purpose. It stores files cheaply and durably and doesn’t care what’s in them. No schemas, no transactions, no relational anything. Just objects.
When you dump raw logs, IoT telemetry, and CSV exports into a bucket without any organizing layer, congratulations, you have a data lake. Cheap, flexible, and almost completely useless for analytics until someone builds a pipeline to make sense of it.
The traditional answer to that mess was a data warehouse. Snowflake, Redshift, BigQuery, the whole gang. You force your data through ETL, conform it to a strict schema, and pay a premium to keep it sitting in the vendor’s proprietary storage format. You get fast SQL, ACID transactions, and a vendor lock-in problem so severe that exporting your data becomes a major friction point.
The lakehouse is what happens when someone finally says: what if we kept the cheap object storage, but added the warehouse features as a layer on top?
What a Lakehouse Actually Is
The trick is decoupling. Storage stays in your bucket. Compute is whatever engine you point at it. Metadata lives in an open table format that turns a pile of Parquet files into something that behaves like a real database table.
One copy of the data. Multiple engines can query it. Schema evolution, time travel, ACID transactions, all without copying everything into a proprietary system. From what I’ve read, teams that move from a pure warehouse to a lakehouse are able to cut storage costs noticeably in the process, and they stop fighting their ML team about getting access to the same data.
That’s the pitch, and it’s a good one. The hard part is picking your table format.
The Four Formats Worth Knowing
Apache Iceberg
Iceberg is the one to bet on if you care about not getting locked in. It came out of Netflix and even Snowflake and Databricks have been forced to support it. The metadata is hierarchical, which sounds boring but matters: it lets query engines skip enormous chunks of irrelevant data without listing directories one by one. Iceberg also handles partition evolution gracefully, so you can change your partitioning strategy without rewriting petabytes of history.
If I’m starting a new lakehouse in 2026 and I don’t have a strong reason to pick something else, it’s Iceberg.
Delta Lake
Delta is what Databricks ships and what everyone using Spark already knows. It uses an append-only transaction log in a
_delta_logdirectory, and it’s beautifully integrated with the Databricks platform. Z-Ordering, native Spark performance, the whole ecosystem.If your team lives inside Databricks, Delta is the obvious answer. If you don’t, the calculus is harder, because Delta’s openness has improved a lot but it still feels most at home in the Databricks world.
Apache Hudi
Hudi came out of Uber and it was built for one thing: high-frequency upserts. If your problem is Change Data Capture, streaming ingestion, or constant record-level updates, Hudi is probably your answer. It gives you two storage modes. Copy-on-Write rewrites files on update so reads stay fast. Merge-on-Read writes deltas and reconciles at query time, which is what you want when writes are heavy and reads are flexible.
Hudi is the right pick when your pipeline is full of
UPSERTand you can’t afford to rewrite large files every time something changes.Apache Paimon
Paimon is the newest of the four and it’s worth keeping an eye on. It came from the Flink world and uses an LSM-tree style organization, which is what databases like RocksDB use under the hood. The whole point is unifying batch and streaming in a single format. If you’re doing real-time event-driven work and don’t want to maintain a separate streaming and batch stack, Paimon is interesting.
It’s not the safe choice yet, but it’s the one I’d watch most closely over the next two years.
So Which One?
Honestly, the answer depends less on the format and more on which ecosystem you’re already in.
- Mostly Spark and Databricks? Delta.
- Streaming-heavy with constant upserts? Hudi.
- Real-time event-driven and willing to bet on newer tech? Paimon.
- Anything else, or you want to keep your options open? Iceberg.
The format wars have mostly converged. Most major engines support multiple formats now, and the gap between them on raw query performance has shrunk. The choice is more about operational fit than performance ceilings.
The lakehouse pattern itself is the real story. The format is just plumbing.
I’d appreciate a follow. You can subscribe with your email below. The emails go out once a week, or you can find me on Mastodon at @[email protected].
/ Data / Infrastructure / Lakehouse / Iceberg
-
How Kong Actually Works in Kubernetes
At some point with microservices in Kubernetes, basic Ingress routing stops being enough. Kong is interesting router that I would like to try in the future.
It’s an API Gateway built on top of NGINX and OpenResty. It operates at the infrastructure layer, managing the actual HTTP traffic flowing into your cluster. Drop it into a Kubernetes environment and it acts as an Ingress Controller. It does that job really well.
The Ingress Controller Problem
We should review what an ingress controller is. In case you’re familiar, unfamiliar with its job in Kubernetes. An
Ingressresource is just a set of routing rules. “Send traffic forapi.example.com/v1to theuser-servicepod.” Kubernetes doesn’t actually route traffic itself. It needs a controller to read those rules and move the packets.The Kong Ingress Controller (KIC) runs as a pod inside your cluster. It watches the Kubernetes API server for changes to Ingress resources, Services, and Endpoints. When someone deploys a new app and creates an Ingress rule, KIC picks it up, translates the Kubernetes config into Kong’s native format, and reloads the proxy. No manual intervention.
How Traffic Actually Flows
When external traffic hits your cluster, the path looks like this:
- External Load Balancer forwards traffic to the Kong proxy pods
- Kong evaluates the incoming request against its routing table (headers, paths, hostnames)
- Plugins execute before routing, handling cross-cutting concerns at the edge instead of inside your application code
- Upstream routing sends traffic directly to Pod IPs, bypassing
kube-proxyfor better performance
That plugin step is where Kong really earns its keep. Rate limiting, API key auth, mTLS, request transformation. All of that happens at the gateway layer so your services don’t have to think about it.
CRDs Make It Actually Useful
Standard Kubernetes Ingress is pretty limited. Host-based routing, path-based routing, and that’s about it. Kong extends this with Custom Resource Definitions:
- KongPlugin lets you attach behaviors to routes or services. Deploy a manifest to enforce rate limits, require API keys, or add mTLS to a specific endpoint.
- KongConsumer manages user identities and credentials directly in Kubernetes, so you can tie routing rules or rate limits to specific clients.
This means your API gateway configuration lives right alongside your application manifests. Version controlled, reviewable, deployable through your normal CI/CD pipeline.
Skip the Database
Kong used to require PostgreSQL or Cassandra to store its routing config. In modern Kubernetes deployments, you almost always run it in DB-less mode instead.
Why? Kubernetes already has
etcdas its source of truth for cluster state. Running a second database just for the API gateway adds overhead and failure modes you don’t need. In DB-less mode, Kong stores its configuration entirely in memory. The Ingress Controller reads state from Kubernetes and pushes updates to the proxy dynamically.This is one of those decisions that sounds minor but changes everything about how you operate Kong. No database backups to worry about. No schema migrations. Your gateway config is just Kubernetes manifests managed through GitOps.
Observability at the Edge
Sitting at the edge of the cluster, Kong is perfectly positioned to capture metrics, logs, and traces. With the right plugins, it exports traffic data (latency, status codes, request volumes) directly into whatever observability stack you’re running.
You get visibility across your entire microservice architecture without instrumenting every individual service.
Kong isn’t the only Ingress controller out there, but the combination of plugin architecture, DB-less mode, and CRD-based configuration makes it a solid choice if you need more than basic routing. If you’re already running Kubernetes and find yourself writing the same auth and rate-limiting logic across multiple services, moving that to the gateway layer is worth your time.
I’d appreciate a follow. You can subscribe with your email below. The emails go out once a week, or you can find me on Mastodon at @[email protected].
/ DevOps / Kubernetes / Kong / Infrastructure