It's an exciting time to be building solutions on Databricks! There were some very interesting capabilities announced yesterday at the Databricks Data + AI Summit. Especially around large language models, English queries, and data governance. Here are some of our favorites.
"English is the hottest new coding language." - Andrej Karpathy (Databricks)
MosaicML: Databricks announced intentions to acquire MosaicML. MosaicML is a company that specializes in advanced large language models (LLMs) called MPT models. These models have gained popularity, with over 3.3 million downloads of their MPT-7B model and the recent introduction of the MPT-30B model. MosaicML enables organizations to efficiently construct and train their own cutting-edge models using their own data, without incurring high costs.
LakehouseIQ: LakehouseIQ is an innovative knowledge engine that learns the unique aspects of your business, enabling everyone to obtain accurate answers from their data. By learning from your data, usage patterns, and organizational structure, it provides contextually relevant results, surpassing the capabilities of conventional large language models alone. It utilizes signals from across the Databricks Lakehouse platform, learning how data is used in practice to build highly specialized models for your enterprise. LakehouseIQ is instrumental in powering various natural language interfaces within Databricks and is also accessible via APIs for the creation of custom AI applications.
Databricks Assistant: Powered by LakehouseIQ, the Databricks Assistant is a context-aware tool that uses natural language to generate reports, explain and generate code, and answer data and code-related queries. It's designed to understand the right data for each activity, providing accurate results that save users significant time. Through its integration with LakehouseIQ, it not only finds the necessary data but also understands its usage within the enterprise, making it a powerful tool for data analysis and code generation.
Lakehouse AI : Lakehouse AI, on the other hand, helps enterprises build generative AI solutions on the platform for their own use cases. This digital toolbox covers the entire AI lifecycle, from data collection and preparation to model development and LLMOps to serving and monitoring. Databricks is expanding Lakehouse AI with vector embedding search to improve generative AI responses; a curated collection of open-source models available in the marketplace; LLM-optimized model serving; MLflow 2.5, with capabilities such as AI gateway and prompt tools; and lakehouse monitoring for end-to-end visibility into the data pipelines driving the AI efforts.
Lakehouse Federation: Lakehouse Federation is a feature that allows you to discover, query, and govern your data, irrespective of where it resides. It ensures seamless access and control over your data across various storage locations, even external to Databricks.
AI Governance: The Unity Catalog now incorporates AI Governance features such as a Feature Store, Model Registry, and Volumes. These additions further enhance the management and governance of AI models and features within the platform.
Lakehouse Monitoring and Observability: This feature ensures quality and integrity monitoring for all your data and AI assets. It provides detailed information on billing, audit trails, lineage, and security as tables, enhancing the observability of your data assets.
Databricks Marketplace: The Databricks Marketplace is an open platform for sharing and trading data, analytics, and AI models. It is now generally available and provides a unified space for collaboration and innovation in data science and AI.
Lakehouse Apps: Lakehouse Apps offer a secure way to build, distribute, and run innovative data and AI applications directly on the Databricks Lakehouse. It streamlines the process of deploying and sharing applications, fostering innovation within the organization.