<rss version="2.0">
  <channel>
    <title>Databricks on LLBBL Blog</title>
    <link>https://llbbl.blog/categories/databricks/</link>
    <description></description>
    
    <language>en</language>
    
    <lastBuildDate>Mon, 04 May 2026 10:00:00 -0500</lastBuildDate>
    
    <item>
      <title>The Real Cost of Your Data Lake (It&#39;s Not the Storage)</title>
      <link>https://llbbl.blog/2026/05/04/the-real-cost-of-your.html</link>
      <pubDate>Mon, 04 May 2026 10:00:00 -0500</pubDate>
      
      <guid>http://llbbl.micro.blog/2026/05/04/the-real-cost-of-your.html</guid>
      <description>&lt;p&gt;If you&amp;rsquo;re sketching out a data platform on a whiteboard right now, I want you to do something. Stop calculating storage costs. They&amp;rsquo;re not the bill.&lt;/p&gt;
&lt;p&gt;I pulled the public pricing for AWS, Azure, GCP, Databricks, and Snowflake and stacked them next to each other. Storage is the cheap part. The expensive part is everything that &lt;em&gt;moves&lt;/em&gt; the data, and the expensive part is the part you&amp;rsquo;re least likely to model correctly when you&amp;rsquo;re picking a vendor.&lt;/p&gt;
&lt;p&gt;Let me walk through what actually shows up on the invoice.&lt;/p&gt;
&lt;h2 id=&#34;raw-object-storage-is-basically-free&#34;&gt;Raw Object Storage Is Basically Free&lt;/h2&gt;
&lt;p&gt;For hot, frequently accessed data, the big three are within a rounding error of each other:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Azure Blob (LRS, Hot):&lt;/strong&gt; $0.018 per GB/month&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Google Cloud Standard:&lt;/strong&gt; $0.020 per GB/month&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AWS S3 Standard:&lt;/strong&gt; $0.023 per GB/month (first 50 TB)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Drop into the cool tiers and AWS S3 takes the lead at $0.0125 per GB. Drop into deep archive and you&amp;rsquo;re paying $0.00099 per GB on either AWS Glacier Deep Archive or Azure Archive. That&amp;rsquo;s a tenth of a cent per gigabyte, per month, for data you almost never touch.&lt;/p&gt;
&lt;p&gt;Good for you, but I think anyone leading with &amp;ldquo;per-GB storage cost&amp;rdquo; in a procurement deck is selling you a story. Storage capacity is roughly &lt;strong&gt;five percent&lt;/strong&gt; of a typical Databricks bill. Five. The other 95% is the part nobody wants to talk about.&lt;/p&gt;
&lt;h2 id=&#34;the-egress-trap&#34;&gt;The Egress Trap&lt;/h2&gt;
&lt;p&gt;Ingress is free. Always. The cloud providers want your data in.&lt;/p&gt;
&lt;p&gt;Getting it back out is where they collect.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Azure Blob:&lt;/strong&gt; $0.087/GB external egress&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AWS S3:&lt;/strong&gt; $0.090/GB&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Google Cloud:&lt;/strong&gt; $0.120/GB (but free if you stay inside Google&amp;rsquo;s ecosystem, which is the whole point of that pricing)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Then layer on API operations. A million GET requests on S3 costs about $0.40. The same million GETs on Google Cloud Storage can run closer to $5.00 because they classify operations differently. If your analytics workload is hammering small files, those API calls add up faster than the storage they&amp;rsquo;re reading.&lt;/p&gt;
&lt;p&gt;Storing 10 TB? Maybe $200 a month. Storing 500 TB? You&amp;rsquo;re at $10,000 a month before a single byte leaves the region or a single query fires.&lt;/p&gt;
&lt;h2 id=&#34;databricks-two-bills-one-headache&#34;&gt;Databricks: Two Bills, One Headache&lt;/h2&gt;
&lt;p&gt;Databricks uses what&amp;rsquo;s commonly called a &lt;a href=&#34;https://www.dawiso.com/glossary/databricks-pricing-explained-real-cost-breakdown-for-2025&#34;&gt;Two-Bill Model&lt;/a&gt;. You get one invoice from your cloud provider for the actual VMs and storage, and a separate invoice from Databricks for the software, measured in DBUs (Databricks Units).&lt;/p&gt;
&lt;p&gt;In a typical mid-sized deployment around $18,000/month, the breakdown looks like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;VM compute from the cloud provider: ~55%&lt;/li&gt;
&lt;li&gt;Databricks DBU fees: ~30%&lt;/li&gt;
&lt;li&gt;Storage: ~5%&lt;/li&gt;
&lt;li&gt;Network egress: ~5%&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The DBU rate changes based on what you&amp;rsquo;re doing. Automated jobs start at $0.15/DBU. Interactive notebooks for analysts start at $0.40/DBU. That&amp;rsquo;s not an accident. Databricks wants you running production workloads on cheap job clusters, not on the expensive all-purpose clusters your data scientists love to leave running over a weekend.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re not actively pushing teams toward job clusters and ARM-based instances, you&amp;rsquo;re leaving real money on the table.&lt;/p&gt;
&lt;h2 id=&#34;snowflake-the-hidden-storage-multiplier&#34;&gt;Snowflake: The Hidden Storage Multiplier&lt;/h2&gt;
&lt;p&gt;Snowflake&amp;rsquo;s pricing pitch sounds clean. Pass-through storage at $40/TB/month on-demand, dropping to $23/TB/month with a capacity commitment. Compute as Credits. Done.&lt;/p&gt;
&lt;p&gt;Except it isn&amp;rsquo;t done. Snowflake stores data in immutable 16MB micro-partitions. Immutable. You can&amp;rsquo;t change them in place. Update a single row in a 1 TB table and Snowflake writes a new file and keeps the old one around.&lt;/p&gt;
&lt;p&gt;Why keep the old one? Two features:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Time Travel:&lt;/strong&gt; query historical states of your data for up to 90 days&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fail-Safe:&lt;/strong&gt; a 7-day disaster recovery window you cannot turn off&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is the part that gets people. A 1 TB table that&amp;rsquo;s getting updated multiple times a day can &lt;a href=&#34;https://select.dev/posts/snowflake-pricing&#34;&gt;balloon to 25 TB of &lt;em&gt;billed&lt;/em&gt; storage&lt;/a&gt; because Snowflake is retaining every prior version of every micro-partition you&amp;rsquo;ve touched. Your dashboard says &amp;ldquo;1 TB table.&amp;rdquo; Your invoice says otherwise.&lt;/p&gt;
&lt;p&gt;And compute? Virtual Warehouses bill per second, but with a 60-second minimum every single time you resume or resize. Aggressive auto-suspend sounds like a cost optimization. It&amp;rsquo;s not. If you&amp;rsquo;re spinning a warehouse up and down every 30 seconds, you&amp;rsquo;re paying the 60-second minimum every time and quietly multiplying your bill.&lt;/p&gt;
&lt;h2 id=&#34;what-id-actually-do&#34;&gt;What I&amp;rsquo;d Actually Do&lt;/h2&gt;
&lt;p&gt;A few things I&amp;rsquo;d put on the wall before signing anything:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Model egress, not storage.&lt;/strong&gt; Run your worst-case query pattern through the calculator. Storage is noise.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Lifecycle everything.&lt;/strong&gt; Cool tier and archive pricing are 10x to 100x cheaper. If your data is older than 90 days and nobody&amp;rsquo;s queried it, it shouldn&amp;rsquo;t be in hot storage.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;For Databricks:&lt;/strong&gt; push every recurring workload to job compute. Audit interactive cluster usage monthly.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;For Snowflake:&lt;/strong&gt; if you have high-frequency update patterns, profile your actual storage footprint, not your logical table size. The gap will surprise you.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;For multi-cloud:&lt;/strong&gt; don&amp;rsquo;t. Egress will eat the savings before you finish the architecture diagram.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The vendors all have a story about why their model is the cheap one. Read past the per-GB number on the slide. The bill is somewhere else.&lt;/p&gt;
&lt;p&gt;Happy modeling.&lt;/p&gt;
&lt;h2 id=&#34;sources&#34;&gt;Sources&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.dawiso.com/glossary/databricks-pricing-explained-real-cost-breakdown-for-2025&#34;&gt;Databricks Pricing Explained (Dawiso)&lt;/a&gt; — Two-Bill Model, DBU breakdown&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://select.dev/posts/snowflake-pricing&#34;&gt;Snowflake Pricing Explained (SELECT.dev)&lt;/a&gt; — Time Travel storage multiplier, micro-partition behavior&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.finout.io/blog/cloud-storage-pricing-comparison&#34;&gt;Cloud &amp;amp; AI Storage Pricing Comparison 2026 (Finout)&lt;/a&gt; — AWS / Azure / GCP per-GB and tier pricing&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.ai-infra-link.com/s3-vs-gcs-vs-azure-blob-storage-2025-cloud-storage-showdown-performance-pricing-features-compared/&#34;&gt;S3 vs GCS vs Azure Blob Storage (ai-infra-link)&lt;/a&gt; — Egress and API operation pricing&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.cloudzero.com/blog/snowflake-pricing/&#34;&gt;Snowflake Pricing in 2026 (CloudZero)&lt;/a&gt; — Virtual Warehouse 60-second minimum behavior&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&amp;rsquo;d appreciate a follow. You can subscribe with your email below. The emails go out once a week, or you can find me on Mastodon at &lt;a href=&#34;https://micro.blog/llbbl?remote_follow=1&#34;&gt;@logan@llbbl.blog&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>