GENERATIVE AI

Databases Adapt for Generative AI and Large Language Models

Databases must adapt to generative AI and seamlessly integrate with large language models to meet evolving enterprise data needs.

Read time: 6 min read
Word count: 1,241 words
Date: Jan 6, 2026

Summarize with AI

Sailesh Krishnamurthy, VP of Engineering for Databases at Google Cloud, discusses the critical evolution of databases to support generative AI and large language models. He explores challenges like connecting LLMs to operational data, ensuring data security and freshness, and improving natural language to SQL generation. Krishnamurthy highlights Google's solutions, including the MCP Toolbox and secure views, and envisions AI-native databases with hybrid search capabilities and integrated AI functions, transforming how enterprises interact with structured and unstructured data for more relevant results.

🌟 Non-members read here

The convergence of generative artificial intelligence (AI) and database technology presents significant challenges and opportunities for enterprises. Sailesh Krishnamurthy, Vice President of Engineering for Databases at Google Cloud, has extensively explored how databases must evolve to integrate seamlessly with large language models (LLMs). His work includes leading initiatives to leverage generative AI, specifically Google’s Gemini models, in database management systems.

Krishnamurthy recently shared insights into the complexities of bridging the gap between LLMs and operational data. He discussed the difficulties of accurately generating SQL from natural language queries and how Google Cloud’s database team is addressing these issues. The discussion also delved into the future of databases, highlighting their transformation to meet the demands of generative AI applications and their users.

Bridging the Gap Between LLMs and Operational Data

The fundamental challenge lies in connecting the vast world knowledge embedded in LLMs with the dynamic, secure, and permissioned operational data within enterprises. While LLMs excel at processing document corpuses for information retrieval, integrating them with structured database content is far more intricate. Enterprise data is not only heavily secured but also constantly changing, making static replication impractical due to data staleness.

Krishnamurthy identifies two primary approaches to connecting LLMs with databases: replication and federation. Replication involves extracting data from the database into another system, which can quickly lead to outdated information. Federation, conversely, involves the LLM or an orchestration mechanism dynamically querying the database on the fly. This real-time interrogation offers greater data freshness and accuracy.

Challenges in Integration

Integrating LLMs with enterprise databases introduces several complexities. Microservices, often used by enterprises to interact with databases, typically provide narrow data apertures. This limits the LLM’s view to only a small subset of available data. Expanding this view without compromising security is a significant hurdle.

Security is paramount when connecting LLMs to databases. Enterprise users often interact with systems like Gemini for Enterprise as logged-in individuals, distinct from the service principals that agents use to connect to the database. Ensuring that user permissions are correctly translated and enforced, preventing data exfiltration and unauthorized access, is a tricky but essential aspect of secure integration.

Google’s Solutions for LLM-Database Integration

Google Cloud is actively developing solutions to make operational databases more accessible and secure for agentic applications. These solutions range from simple connectivity tools to sophisticated natural language to SQL generation and advanced security measures. The aim is to empower LLMs to interact with databases in a meaningful, secure, and accurate way.

The MCP Toolbox for Databases

One of Google’s initial offerings is the MCP Toolbox for Databases, an open-source tool designed to simplify connecting LLMs or orchestration systems to various databases. This toolbox addresses common problems related to connectivity and security, making it easier for AI agents to query operational data. It supports Google’s first-party databases and has seen widespread adoption, including by competitors, indicating its utility across the industry.

The MCP Toolbox allows users to add custom tools, essentially templated SQL queries with parameters. LLMs can then select the appropriate tool based on a user’s natural language request and generate the necessary parameters. A key challenge, however, is crafting accurate English descriptions for these SQL queries. While engineers excel at writing SQL, translating that into precise natural language descriptions for an LLM to interpret is a different skill set, though AI could potentially assist in this translation. Furthermore, strict security protocols are necessary to control which parameters an LLM is permitted to set.

Natural Language to SQL Generation

Beyond pre-defined tools, the “seductive promise” of AI lies in its ability to handle more open-ended queries. This necessitates automatically generating SQL from natural language descriptions, a task fraught with accuracy and security considerations. Google’s team has made significant strides in this area, recently achieving a top position on a leading benchmark for natural language to SQL conversion.

Several factors contribute to improving SQL generation accuracy. Providing the LLM with richer context, including detailed schema and metadata, is crucial. Implicit assumptions made during schema design, such as how billing and shipping addresses might relate, are vital pieces of information that, when conveyed to the model, enhance query accuracy. Google’s deep integration across its technology stack, from chips to models and databases, allows for close collaboration with teams like Google DeepMind to continually refine the models themselves. This combined approach of better models and improved contextual understanding leads to more precise SQL queries.

Parameterized Secure Views

Security remains a top concern when enabling LLMs to generate arbitrary queries. Google has developed “parameterized secure views” within the database to address this. These views allow for the definition of robust security policies and barriers, ensuring that an LLM-generated query, even if broadly formulated, will not expose information that a logged-in user is not authorized to see. This technology provides an information-theoretical guarantee against data leakage, preventing unauthorized access while maximizing the utility of LLM-driven queries.

The Evolution Towards AI-Native Databases

The traditional database model, focused solely on storing data securely and delivering exact results from structured queries, is evolving. Krishnamurthy posits that the future involves databases that handle both structured and unstructured data, prioritizing “most relevant” results over merely “exact” ones. This shift imbues databases with capabilities akin to search engines, emphasizing relevance and ranking.

Vector Indexing and Hybrid Search

A key enabler of this evolution is vector indexing. Enterprises are extracting vector embeddings from unstructured data (images, videos, PDFs) stored in object stores and co-locating them with structured data in databases. This allows for complex queries that combine information from both data types. For instance, an e-commerce platform can search for products not just by image or description but also by price and real-time inventory levels at nearby physical stores.

Attempting to stitch together separate vector stores and traditional databases at the application level proves challenging due to varying predicate selectivity per request. Google’s solution, adaptive filter vector search, integrates different indexes within a single system. This combines vector search, full-text search, and other structured searches, optimizing for speed and quality. Target.com’s implementation of this technology, moving its online catalog and vector search to AlloyDB, resulted in a 50% reduction in “no results” pages and a 20% improvement in business outcomes. This demonstrates a paradigm shift, moving towards iterative improvement in search results, similar to how Google’s spellcheck revolutionized web search decades ago.

Multimodal and AI Functions

An AI-native database, from Google’s perspective, possesses several key attributes. Firstly, it features AI search, encompassing vector index hybrid search that unifies various search modalities. Secondly, multimodal databases are emerging, such as Google’s Spanner, which now offers a graph interface to data. User expectations have shifted, demanding that all data work together to answer broad, general-purpose questions.

Thirdly, AI-native databases integrate AI functions directly within the system. This allows users to leverage LLM technology for specific database operations. For example, a SQL function named “AI” could perform interesting operations on text fields within a database column, such as identifying American brands from a list. This capability enhances the database’s ability to process and interpret textual information, providing richer insights.

In essence, the progression begins with custom tools, which effectively disaggregate applications into query-issuing agents connected to AI. The next phase involves enabling unanticipated, open-ended questions with high accuracy and robust security. Finally, the database itself becomes more intelligent, capable of handling diverse data types and providing highly relevant results. This continuous evolution marks a transformative era for database technology, positioning it at the core of the generative AI revolution.