The impressive capabilities of large language models (LLMs) not only caught the attention of general users, but also led to an increased interest in enterprise applications, especially knowledge-intensive tasks. LLMs have achieved notable success in 1) understanding and comprehending the data in structured, unstructured, and other modalities, 2) facilitating seamless integration of several other tools such as web search, databases, or even other predictive machine learning (ML) models through in-context learning. On the other hand, LLMs have shown to be promising in planning and decomposition of complex tasks into smaller, easily executable tasks.
AI systems, specifically, Compound AI (CAI) systems are gaining traction and leading to further improvements in LLMs’ capabilities on complex tasks. CAI systems are able to piece together multiple components (one or more agents composed of LLMs with varying capabilities) along with other tools. This enables CAI systems to achieve state-of-the-art results on a wide variety of tasks, especially on enterprise-grade, knowledge-intensive tasks. The success of the CAI systems is typically realized through two stages: 1) Orchestration of a task workflow that includes decomposing the task into using (reusable) modules – through LLM or other tools such as a database, knowledge graph, or ML models. 2) Optimization of modules independently or together for a specific task to maximize task specific performance metric such as accuracy.
In this article, we focus on the opportunities for optimization in CAI systems, namely multi-objective, multi-plan, and constrained optimization: opportunities that go beyond the current state of optimizations. First, we will briefly walk through the orchestration of workflows that help set the stage for optimization.
Orchestration in Compound AI systems
CAI systems piece together different components to orchestrate a task workflow. Components are typically implemented as reusable modules, and a pipeline chains together the components to create a workflow. Modules’ complexity can vary depending on the task. They are as simple as Python code that returns the length of a list or implements a more complex functionality such as retrieval-augmented generation (RAG). AI systems like AlphaCode2 employ a predefined set of components that constitute code generation (via policy models and sampling), filtering (removes irrelevant programs), and scoring (scores the candidate programs).
Building CAI systems from the ground up is a tedious task because of the extensive design possibilities and their corresponding decisions. Frameworks like DSPy, AutoGen, and Agentforce are a great attempt at addressing some of the complexities involved in building a CAI system by decoupling the workflow design from the workflow execution. In DSPy or AutoGen Studio, modules can be easily developed just by declaratively specifying the task description, input, and output constraints. Then, DSPy compiler compiles it to the appropriate prompt for the task suited for the underlying LLM. An optional but very helpful step is to optimize the prompt for a given task-specific dataset.
class QA(dspy.Signature):
"Provide an answer for the question"
question = dspy.InputField()
answer = dspy.OutputField(desc="answer in 3 to 4 words")
Declarative specification of QA task in DSPy
Why do we need optimization?
Optimization has played a key role in several systems such as databases and AutoML. In the context of CAI systems, where several components are put together, identifying the right set of each components’ (hyper) parameters is fundamental to not only maximizing accuracy but also reducing costs, which plays a critical role in deploying LLMs at the enterprise scale. For instance, consider a simple CAI system RAG with two components: retriever and LLM. Given the availability of many LLM models with varying degrees of accuracy, response time, and cost, it is challenging to pick the right LLM that satisfies the user’s budget constraints while maximizing the performance on a given task. This problem is even exacerbated in enterprise workflows that involve intricate interplay with domain data, LLMs, and tools.
Optimization in Compound AI systems
Optimization methods in a CAI system broadly focus on two levels of optimization – module-level, where prompts and demonstrations of a single LLM are optimized for a given task and pipeline-level, which optimizes a set of modules connected via directed acyclic graph (DAG).
- Module-level optimization
Auto prompt optimization (APO) has gained a lot of traction recently with the goal of finding the best prompt (instruction + examples) for a given task. Recent work demonstrated that on over 20 different tasks, optimizing both instruction and examples could yield a 16% absolute improvement in test accuracy over unoptimized instruction/exemplars on PALM2 model. Although this study targets optimizing prompts for a single LLM, it provides valuable insights into how prompt optimization can yield better performance given that most of the LLMs available today are accessible only in inference mode. For interested readers who want to learn more about different optimizers for instruction and exemplars selection, check out this comprehensive survey.
- Pipeline-level (a.k.a. multi-stage) optimization
APO methods that target a single LLM, while useful, are limited especially when we consider the complex pipelines in enterprise scenarios that constitute multiple LLMs. Individual optimization of each LLM module in a pipeline, using APO methods, is definitely an easy option to consider. At the same time, it is suboptimal when we take the end-end performance and is even more challenging to get the labeled data for intermediate LLM modules. Recently a generic optimization framework was introduced to solve multi-stage optimization. It is well-known that multi-stage optimization is an intractable problem due to the extremely large optimization space which exponentially grows with the number of modules in the pipeline. To this end, a few approximated and tractable optimizers such as MOPRO (an extension of OPRO), MIPRO (based on Bayesian optimization), and POPRO (a program-level OPRO) have been proposed. Another line of work introduced a metric called “Reasoning Capacity” to quantify the ability of the CAI system to perform a task based on the input, constraints, and the generated output. Relying on the information theoretic approach, reasoning capacity of a system executing a task with a plan p is proportional to the mutual information of output, input using p over mutual information of output, input using all possible plans.
Path Forward
The problem of optimization in CAI systems has received a lot of attention both from academia and the broader industry as it increases the viability of CAI systems in enterprise settings. Significant progress has been made, which has resulted in the establishment of principles for optimization and state-of-the-art performance on several tasks that use optimization. However, we note that there is a lot of progress that needs to be made. Below are three opportunities that, while not all-encompassing, point toward clear next steps for optimization.
Opportunity 1: Multi-objective optimization
Although there are few independent and task-specific AI pipelines that optimize for accuracy along with other objectives, the majority of the CAI optimization efforts thus far focus on accuracy. This is a crucial metric, but not the only metric that needs attention for the effective deployment of CAI systems in practice. Along with accuracy, other metrics such as cost and latency play an important role in optimization. We believe that moving from a single objective to multi-objective is an important and essential opportunity and a challenge to be addressed.
Opportunity 2: Multi-plan optimization
A task in many scenarios, especially in the enterprise setting, can be solved by multiple alternative plans (or programs) which is the result of the planner agent producing plans. An interesting opportunity for optimization arises in the space of multi-plan optimization; basically, optimization scope goes from optimizing a fixed set of modules to optimizing a set of plans. Although this creates an extremely large search space, taking cues from the optimization approaches such as branch and bound can help in addressing the problem of multi-plan optimization.
Opportunity 3: Constrained optimization
Building on opportunities 1 and 2, many applications require a task to be accomplished with certain constraints such as budget. Such constraints require developing new methods to find optimal plans of execution, i.e., a set of modules and their optimal hyperparameters.
Conclusions
Compound AI systems are quickly gaining popularity because of their ability to integrate other tools with LLMs demonstrating the fact that “more is better than one.” However, a naïve integration might be counterproductive, leading to suboptimal performance compared to monolithic systems. Recent efforts demonstrate that employing optimization to CAI system workflows has been particularly successful in multiple tasks. However, effective and successful deployment of CAI systems in enterprise scenarios need a principled approach towards optimization workflows. In particular, we’ve echoed throughout this blog that the optimization framework should achieve broader goals such as multi-objective (accuracy, cost, latency, etc.), multi-plan optimization, and also handling constraints, especially the budget. Again, these optimization goals are not comprehensive, but they are critical for enterprise scenarios.
Written by Sairam Gurajada and Megagon Labs