Conversational AI essentially refers to AI applications like chatbots and virtual assistants that can interact with users through natural language, answering questions and providing assistance. Due to their language capabilities, these AI technologies are commonly used in customer service, sales, marketing, and various other business settings.
A report published by Gartner in 2022 predicted that by 2027, Conversational AI chatbots will become the primary customer service channel for roughly a quarter of organizations. It is not surprising, therefore, that MarketAndMarkets study around Conversational AI found that the market is growing at a CAGR of 22.6%, and will touch USD 29.8 billion by 2028, from USD 10.7 billion in 2023.
Conversational AI systems are primarily built using machine learning technologies. An ideal Conversational AI system is built in a way that the system not only understands and generates human language but also improves over time by learning from interactions. Such a system should ideally personalize its responses based on the user they are interacting with.
Evolution of Conversational AI
In the past, Conversational AI systems were built using programmed scripts that employed decision trees and were therefore severely limited in their understanding of natural language and would often respond with repetitive, scripted interactions. Building and maintaining these systems used to be labor-intensive.
However, since highly capable LLMs started emerging, the process of building Conversational AI systems has gone through a complete upheaval. LLMs have the incredible capability of understanding language. They adapt over time and can be built to personalize responses based on past interactions. They can be trained to handle complex queries, and that too, in multiple languages. This has led to a transformation in how we think about Conversational AI.
Yet, we have only touched the tip of the iceberg. Around mid-2023, we saw that the trend of releasing extremely powerful open-source LLMs started within the developer community. First, we had the release of Falcon by TII with its 40B and 180 variants. Then came Llama2 by Meta, with its 7B, 13B, and 70B variants. Then a flood of other powerful LLMs emerged, such as Mistral-7B, Mixtral 8x7B, Solar 10.7B, and hundreds of other fine-tuned versions of these LLMs for specific domains and use cases. The most cutting-edge trend currently, in 2024, is around Multimodal LLMs, that is, LLMs which can handle user queries in text, image, audio, or video, and respond in multiple different formats. These are the technologies that are powering the Conversational AI systems now.
Building Conversational AI in the LLM Era
The key aspect of these open LLMs is the fact that they can all be trained and fine-tuned, using an architecture called Retrieval Augmented Generation (RAG) harnessed to create Conversational AI chatbots by small developer teams.
Gone are the days when you needed to program the interactions; the process has been replaced by LLM training and building AI architectures that harness Vector Stores and Knowledge Graphs to ground the LLM in knowledge. Let me explain.
LLM training or fine-tuning is a process through which an LLM can be trained to provide responses that are on-brand. Since an open LLM has been trained on internet-scale data, it is essential to fine-tune it for a company’s specific needs so that the LLM’s responses are relevant to the business use case that the company is building the Conversational AI.
Vector Stores, on the other hand, such as PGVector, Milvus, Chroma, and others have the capability of storing and efficiently searching through high-dimensional data points called vectors. When we store documents, say company documents, contracts, help documents, and product information, in the form of vectors, we get the ability to conduct similarity searches or build powerful recommendation systems. Hence, these have become a powerful tool for applications powered by AI and machine learning. They are increasingly being used as technologies that help provide ‘context’ to LLMs, based on which an LLM generates its responses.
Knowledge Graphs, another such technology, offer even more advanced capabilities. They are, essentially, massive, structured databases that store information about entities and their relationships. We can convert documents into Knowledge Graphs, and then use them to provide LLMs with context and factual grounding. By understanding how concepts are interconnected, LLMs can generate more accurate and informative responses, going beyond simple pattern recognition to true comprehension.
Both Vector Stores and Knowledge Graphs are now being used to provide knowledge about a company to LLMs so that they can tailor their responses for that specific business. Also, they can be used to store information about users, products, and past user interactions, and also harnessed to create highly personalized Conversational AI systems.
Democratization of Conversational AI through Accelerated Cloud
The democratization of AI wouldn’t have been possible without accelerated cloud computing. Building and training LLMs require GPUs. Over the last year, increasingly advanced cloud GPUs like H100 and A100, or GPU clusters like HGX 4xH100, 8xH100, 64xH100, and 256xH100, are available on instant access via AI-first cloud platforms.
These cloud GPUs and cloud GPU clusters are designed to build Generative AI technologies like LLMs. Without them, the training process would be close to impossible. These advanced cloud GPUs accelerate the LLM training process, cutting it down to hours or days instead of months or years.
We have just touched the tip of the iceberg with this groundbreaking technology. Over the next few years, expect to see increasingly advanced cloud GPUs becoming available to developers. As more capable cloud GPUs emerge in the market, we will see even more powerful LLMs emerge. That would, in turn, lead to increasingly sophisticated Conversational AI systems.