Databricks: How a $62B Data Giant Is Becoming the OS for Enterprise AI
Databricks raised $10B at a $62B valuation, reaching $2.4B ARR. How the data lakehouse pioneer is evolving into the operating system for enterprise AI.
The Data Lakehouse Becomes the AI Lakehouse
Databricks has always been at the intersection of big data and AI, but its $10 billion Series J round at a $62 billion valuation marks a definitive transformation from a data analytics platform into the operating system for enterprise AI. With $2.4 billion in annual recurring revenue, 7,000 employees, and over 10,000 enterprise customers, Databricks is one of the largest and most consequential AI infrastructure companies in the world.
The round, led by Thrive Capital with participation from Andreessen Horowitz and NVIDIA, was the largest private funding round in AI infrastructure history — and the signal it sends is unmistakable: enterprise AI is no longer a feature or a product category. It is becoming the foundation of how companies operate.
From Apache Spark to AI Lakehouse
Databricks was founded in 2013 by Ali Ghodsi and six other researchers who created Apache Spark at UC Berkeley's AMPLab. Spark revolutionized large-scale data processing by providing a unified engine for batch processing, real-time streaming, machine learning, and graph computation. The founding team saw an opportunity to commercialize this technology and make it accessible to enterprises through a managed cloud platform.
The company's evolution over the past decade mirrors the broader transformation of enterprise technology:
Phase 1 (2013-2018): Data Engineering. Databricks built its initial business around making Apache Spark accessible to enterprises, offering a managed cloud platform that eliminated the complexity of deploying and operating distributed data processing infrastructure.
Phase 2 (2019-2022): The Data Lakehouse. Databricks pioneered the "lakehouse" architecture, combining the best of data warehouses (structured data, SQL analytics, BI tools) with data lakes (unstructured data, flexibility, low cost). Delta Lake, the company's open-source storage layer, became the foundation for this architecture and was adopted by thousands of organizations worldwide.
Phase 3 (2023-present): The AI Lakehouse. The explosion of generative AI created enormous new demand for Databricks' platform. Enterprises realized that training and deploying AI models requires exactly the kind of unified data infrastructure that Databricks provides — bringing together structured business data, unstructured documents and media, real-time streams, and model training and serving in a single platform.
The $2.4 Billion Revenue Machine
Databricks' commercial success is extraordinary by any standard. The company's $2.4 billion ARR represents growth from virtually zero revenue a decade ago, with recent years showing acceleration rather than deceleration. Key metrics that drive investor confidence include net revenue retention above 140%, meaning existing customers are expanding their usage faster than any churn — a clear signal of deepening platform dependence. The company serves enterprises across every major industry, with particular strength in financial services, healthcare, retail, and technology.
Key Investors and What Their Backing Signals
The investor syndicate behind Databricks' $10 billion round reads as a who's who of technology investing:
Thrive Capital led the round, continuing its strategy of backing the most important AI infrastructure companies. Thrive's willingness to write a multi-billion-dollar check reflects deep conviction that Databricks is becoming essential infrastructure for enterprise AI.
Andreessen Horowitz participated in the round, extending its AI infrastructure thesis from model companies to the data platforms that power them. a16z sees Databricks as the "picks and shovels" play in the AI gold rush — a company that benefits regardless of which AI models or applications ultimately win.
NVIDIA invested alongside providing the GPU infrastructure that Databricks customers use for AI workloads. NVIDIA's investment reflects the symbiotic relationship between GPU hardware and data platforms — as Databricks customers train more AI models, they consume more NVIDIA hardware.
Competitive Landscape
Databricks' primary competitor is Snowflake, the cloud data warehouse company that went public in 2020 at the largest software IPO in history. The Databricks-Snowflake rivalry has defined enterprise data infrastructure for the past several years, with both companies expanding into each other's territory. Snowflake has added machine learning capabilities and unstructured data support, while Databricks has enhanced its SQL analytics and business intelligence features.
However, the rise of generative AI has shifted the competitive landscape in Databricks' favor. Databricks' lakehouse architecture, which natively handles both structured and unstructured data, is better suited to AI workloads that require diverse data types. The company's open-source ecosystem (Delta Lake, MLflow, Unity Catalog) has created strong developer loyalty and reduced vendor lock-in concerns that plague competing platforms.
The Path to IPO
Databricks is widely considered the most likely major AI company to IPO in the near term. With $2.4 billion ARR, strong growth metrics, a diversified customer base, and a clear competitive position, the company has all the characteristics that public market investors demand. CEO Ali Ghodsi has indicated openness to a public offering, and the $10 billion funding round — which provides ample capital to continue scaling — positions the company to go public on its own timeline rather than out of necessity.
A Databricks IPO at or above its $62 billion private valuation would be one of the largest technology IPOs in history and would provide a critical proof point for the entire AI infrastructure category. It would also provide liquidity to early investors and employees who have built the company over a decade of growth.
Why Databricks Matters for the AI Ecosystem
Databricks' significance extends beyond its own business. The company's success demonstrates that AI is not just about building better models — it is about building the infrastructure that enables organizations to use AI effectively. Every enterprise deploying AI needs a platform to manage data, train models, deploy inference, and monitor performance. Databricks is building that platform and doing so at a scale that makes it essential infrastructure for the enterprise AI era.
For the venture capital ecosystem, Databricks represents the archetypal "infrastructure winner" — a company that benefits from the growth of AI regardless of which specific models, applications, or use cases ultimately prevail. Whether enterprises use OpenAI, Anthropic, open-source models, or custom-trained models, they need Databricks to manage the data and infrastructure that powers their AI initiatives.
Get the Weekly AI Funding Roundup
Every AI funding deal, delivered weekly. No spam, unsubscribe anytime.