{
  "version": "https://jsonfeed.org/version/1",
  "title": "Snowflake on LLBBL Blog",
  "icon": "https://avatars.micro.blog/avatars/2023/40/125738.jpg",
  "home_page_url": "https://llbbl.blog/",
  "feed_url": "https://llbbl.blog/feed.json",
  "items": [
      {
        "id": "http://llbbl.micro.blog/2026/05/04/the-real-cost-of-your.html",
        "title": "The Real Cost of Your Data Lake (It's Not the Storage)",
        "content_html": "<p>If you&rsquo;re sketching out a data platform on a whiteboard right now, I want you to do something. Stop calculating storage costs. They&rsquo;re not the bill.</p>\n<p>I pulled the public pricing for AWS, Azure, GCP, Databricks, and Snowflake and stacked them next to each other. Storage is the cheap part. The expensive part is everything that <em>moves</em> the data, and the expensive part is the part you&rsquo;re least likely to model correctly when you&rsquo;re picking a vendor.</p>\n<p>Let me walk through what actually shows up on the invoice.</p>\n<h2 id=\"raw-object-storage-is-basically-free\">Raw Object Storage Is Basically Free</h2>\n<p>For hot, frequently accessed data, the big three are within a rounding error of each other:</p>\n<ul>\n<li><strong>Azure Blob (LRS, Hot):</strong> $0.018 per GB/month</li>\n<li><strong>Google Cloud Standard:</strong> $0.020 per GB/month</li>\n<li><strong>AWS S3 Standard:</strong> $0.023 per GB/month (first 50 TB)</li>\n</ul>\n<p>Drop into the cool tiers and AWS S3 takes the lead at $0.0125 per GB. Drop into deep archive and you&rsquo;re paying $0.00099 per GB on either AWS Glacier Deep Archive or Azure Archive. That&rsquo;s a tenth of a cent per gigabyte, per month, for data you almost never touch.</p>\n<p>Good for you, but I think anyone leading with &ldquo;per-GB storage cost&rdquo; in a procurement deck is selling you a story. Storage capacity is roughly <strong>five percent</strong> of a typical Databricks bill. Five. The other 95% is the part nobody wants to talk about.</p>\n<h2 id=\"the-egress-trap\">The Egress Trap</h2>\n<p>Ingress is free. Always. The cloud providers want your data in.</p>\n<p>Getting it back out is where they collect.</p>\n<ul>\n<li><strong>Azure Blob:</strong> $0.087/GB external egress</li>\n<li><strong>AWS S3:</strong> $0.090/GB</li>\n<li><strong>Google Cloud:</strong> $0.120/GB (but free if you stay inside Google&rsquo;s ecosystem, which is the whole point of that pricing)</li>\n</ul>\n<p>Then layer on API operations. A million GET requests on S3 costs about $0.40. The same million GETs on Google Cloud Storage can run closer to $5.00 because they classify operations differently. If your analytics workload is hammering small files, those API calls add up faster than the storage they&rsquo;re reading.</p>\n<p>Storing 10 TB? Maybe $200 a month. Storing 500 TB? You&rsquo;re at $10,000 a month before a single byte leaves the region or a single query fires.</p>\n<h2 id=\"databricks-two-bills-one-headache\">Databricks: Two Bills, One Headache</h2>\n<p>Databricks uses what&rsquo;s commonly called a <a href=\"https://www.dawiso.com/glossary/databricks-pricing-explained-real-cost-breakdown-for-2025\">Two-Bill Model</a>. You get one invoice from your cloud provider for the actual VMs and storage, and a separate invoice from Databricks for the software, measured in DBUs (Databricks Units).</p>\n<p>In a typical mid-sized deployment around $18,000/month, the breakdown looks like this:</p>\n<ul>\n<li>VM compute from the cloud provider: ~55%</li>\n<li>Databricks DBU fees: ~30%</li>\n<li>Storage: ~5%</li>\n<li>Network egress: ~5%</li>\n</ul>\n<p>The DBU rate changes based on what you&rsquo;re doing. Automated jobs start at $0.15/DBU. Interactive notebooks for analysts start at $0.40/DBU. That&rsquo;s not an accident. Databricks wants you running production workloads on cheap job clusters, not on the expensive all-purpose clusters your data scientists love to leave running over a weekend.</p>\n<p>If you&rsquo;re not actively pushing teams toward job clusters and ARM-based instances, you&rsquo;re leaving real money on the table.</p>\n<h2 id=\"snowflake-the-hidden-storage-multiplier\">Snowflake: The Hidden Storage Multiplier</h2>\n<p>Snowflake&rsquo;s pricing pitch sounds clean. Pass-through storage at $40/TB/month on-demand, dropping to $23/TB/month with a capacity commitment. Compute as Credits. Done.</p>\n<p>Except it isn&rsquo;t done. Snowflake stores data in immutable 16MB micro-partitions. Immutable. You can&rsquo;t change them in place. Update a single row in a 1 TB table and Snowflake writes a new file and keeps the old one around.</p>\n<p>Why keep the old one? Two features:</p>\n<ul>\n<li><strong>Time Travel:</strong> query historical states of your data for up to 90 days</li>\n<li><strong>Fail-Safe:</strong> a 7-day disaster recovery window you cannot turn off</li>\n</ul>\n<p>This is the part that gets people. A 1 TB table that&rsquo;s getting updated multiple times a day can <a href=\"https://select.dev/posts/snowflake-pricing\">balloon to 25 TB of <em>billed</em> storage</a> because Snowflake is retaining every prior version of every micro-partition you&rsquo;ve touched. Your dashboard says &ldquo;1 TB table.&rdquo; Your invoice says otherwise.</p>\n<p>And compute? Virtual Warehouses bill per second, but with a 60-second minimum every single time you resume or resize. Aggressive auto-suspend sounds like a cost optimization. It&rsquo;s not. If you&rsquo;re spinning a warehouse up and down every 30 seconds, you&rsquo;re paying the 60-second minimum every time and quietly multiplying your bill.</p>\n<h2 id=\"what-id-actually-do\">What I&rsquo;d Actually Do</h2>\n<p>A few things I&rsquo;d put on the wall before signing anything:</p>\n<ul>\n<li><strong>Model egress, not storage.</strong> Run your worst-case query pattern through the calculator. Storage is noise.</li>\n<li><strong>Lifecycle everything.</strong> Cool tier and archive pricing are 10x to 100x cheaper. If your data is older than 90 days and nobody&rsquo;s queried it, it shouldn&rsquo;t be in hot storage.</li>\n<li><strong>For Databricks:</strong> push every recurring workload to job compute. Audit interactive cluster usage monthly.</li>\n<li><strong>For Snowflake:</strong> if you have high-frequency update patterns, profile your actual storage footprint, not your logical table size. The gap will surprise you.</li>\n<li><strong>For multi-cloud:</strong> don&rsquo;t. Egress will eat the savings before you finish the architecture diagram.</li>\n</ul>\n<p>The vendors all have a story about why their model is the cheap one. Read past the per-GB number on the slide. The bill is somewhere else.</p>\n<p>Happy modeling.</p>\n<h2 id=\"sources\">Sources</h2>\n<ul>\n<li><a href=\"https://www.dawiso.com/glossary/databricks-pricing-explained-real-cost-breakdown-for-2025\">Databricks Pricing Explained (Dawiso)</a> — Two-Bill Model, DBU breakdown</li>\n<li><a href=\"https://select.dev/posts/snowflake-pricing\">Snowflake Pricing Explained (SELECT.dev)</a> — Time Travel storage multiplier, micro-partition behavior</li>\n<li><a href=\"https://www.finout.io/blog/cloud-storage-pricing-comparison\">Cloud &amp; AI Storage Pricing Comparison 2026 (Finout)</a> — AWS / Azure / GCP per-GB and tier pricing</li>\n<li><a href=\"https://www.ai-infra-link.com/s3-vs-gcs-vs-azure-blob-storage-2025-cloud-storage-showdown-performance-pricing-features-compared/\">S3 vs GCS vs Azure Blob Storage (ai-infra-link)</a> — Egress and API operation pricing</li>\n<li><a href=\"https://www.cloudzero.com/blog/snowflake-pricing/\">Snowflake Pricing in 2026 (CloudZero)</a> — Virtual Warehouse 60-second minimum behavior</li>\n</ul>\n<p>I&rsquo;d appreciate a follow. You can subscribe with your email below. The emails go out once a week, or you can find me on Mastodon at <a href=\"https://micro.blog/llbbl?remote_follow=1\">@logan@llbbl.blog</a>.</p>\n",
        "date_published": "2026-05-04T10:00:00-05:00",
        "url": "https://llbbl.blog/2026/05/04/the-real-cost-of-your.html",
        "tags": ["DevOps","Cloud","Data","Snowflake","Databricks"]
      }
  ]
}
