One of the most interesting Generative AI trends is the development of agents powered by large language models (LLMs). The word “agent” denotes an automated system that can perceive its environment and take actions. More specifically, in the context of LLMs, as explained in a recent academic article entitled “AI Agents That Matter,” this word is used mainly when one or several of the following conditions are met:
- The surrounding environment is complex and the potential tasks are open ended.
- The system can be given directives in natural language.
- The system can act with limited or no supervision from a human user.
- The system can use external tools.
- The control flow of the system is dynamic.
The purpose of this blog post is to present some of the most popular open source Python frameworks used to implement LLM-powered agents. We will focus on the following frameworks and we will highlight their similarities as well as their differences (and summarize them in a table in the appendix):
- LangGraph: A general, low-level library of the LangChain ecosystem
- LlamaIndex: A framework specialized in the implementation of retrieval-augmented LLM pipelines
- AutoGen: A framework that models an LLM application as a conversation between multiple agents
- CrewAI: A multi-agent framework that emphasizes user-friendliness
Features Common to All Frameworks
Let’s start with the similarities between these frameworks. First, they all offer the possibility to implement one or several predefined execution logics. The execution logic defines the sequence of LLM completion queries, tool invocations, and user interactions that allows an agent to complete its task. ReAct is a popular example of agent execution logic. All frameworks above also allow developers to create customized execution logics.
Another key feature is the possibility to define tools that agents can use to interact with their environment. From the perspective of an agent, a tool is simply a function that is described in the prompt, can be called with certain input parameters and returns a text output. Typical tools are search engines, web browsers, code interpreters, or messaging services. All frameworks let the developers use off-the-shelf tools and easily create customized tools.
They also include at least some human-in-the-loop features. Agents are prone to errors, especially if they need to follow a relatively high number of steps to perform a task. This is especially worrisome if the agents use tools which not only return information but also trigger side effects (e.g., an administrative task which affects the account of a customer). In this context, it may be critical to involve a human user, either to ask clarifying questions on the requested task or to request a confirmation before taking a sensitive action.
On the technical side, all the frameworks leave users the flexibility to easily switch from one LLM API to another. They offer an integration with the OpenAI API and, by extension, they allow developers to connect to the increasingly common OpenAI-compatible APIs. Furthermore, it is relatively straightforward to define custom LLMs and there are also several other LLM integrations available.
Finally, an unfortunate commonality among the four frameworks is that they offer limited agent evaluation features, if any. Depending on the use case, we may want to only evaluate the final answer provided by an agent or its whole trajectory, including the intermediate steps (especially if tools with side effects are used), and we may have access to a ground truth or not. The four frameworks allow developers to fully record the agents’ trajectories but none of them include evaluation features that encompass these four potential scenarios.
Single Agent or Multi-Agent Systems?
CrewAI and AutoGen are presented as multi-agent frameworks while LlamaIndex focuses on single agent systems, and LangGraph enables both approaches. In a multi-agent system, several specialized agents collaborate to reach a common objective. The execution logic of the system is then framed as a “conversation” between the agents. The agents can, for example, “speak” one after the other or a “manager” agent can give the floor to the agents when relevant.
It makes sense to replace a single agent with multiple agents when it would be impossible or impractical to impart a single agent with all the capabilities required for the use case at hand. For instance, a prompt that includes the instructions and examples for all the target capabilities might be exceedingly complex for the LLM or simply too large for its context window. While managing a single agent is conceptually simpler, combining multiple agents is more modular: We can develop and test independent specialized agents and combine them to better scale with the complexity of a use case.
Single agent frameworks are actually also able to model multiple agents because an agent can be packaged as a tool that is then made available to another agent. In this way, the distinction between single agent frameworks and multi-agent frameworks lies more in the developer experience than in their expressiveness.
Distinctive Features of Agent Frameworks
We now focus on the specific aspects of each framework covered in this blog post.
LangGraph has recently become the preferred way of implementing agents in the popular LangChain ecosystem. LangChain includes some agent-related features but many of them are deprecated and planned to be removed in the near future. The promise of LangGraph is the possibility to implement arbitrary agent execution logic in a simple manner. This execution logic is declared as a graph which specifies how the state of the agent evolves over time. For example, the graph below corresponds to a tool-augmented agent that requests the validation of a human user for certain sensitive tools. The key features of LangGraph are:
- The control the behavior of the agent in a very fine-grained manner
- The seamless integration in the LangChain ecosystem (in particular, the very large number of available built-in tools)
- The creation of checkpoints that capture the past and present states of the agent for monitoring and error recovery purposes
- The evaluation of agents’ trajectories, to assess the relevance of individual actions (which does not, however, cover the case when a ground truth for the full trajectory is available)
LlamaIndex is a framework specialized in the creation of LLM applications that rely on external data sources. LlamaIndex gives the possibility to create agents among many options to better leverage these data sources. Please note that the LlamaIndex team is currently building llama-agents, a “powerful framework for building production multi-agent AI systems” but we do not cover it in this blog post because it is in an early stage. Some of the most distinctive agent features of LlamaIndex are:
- Several pre-configured agents that can directly be used (function calling agent, ReAct agent, and more advanced agent algorithms like Language Agent Tree Search or Chain-of-Abstraction)
- The compatibility with other LlamaIndex components and, in particular, the possibility to create tools based on LlamaIndex data query engines
AutoGen is a multi-agent framework developed by Microsoft. Its central high-level abstraction is the conversation, which corresponds to the execution logic of the multi-agent system. Noteworthy features of AutoGen include:
- A wide range of conversation patterns (e.g., group chat with multiple agents, nested chats, speaker selection, etc.) that gives flexibility to create agent workflows of various complexities
- Several options to involve one or several human users
- A built-in code execution component that offers several options (local Shell, Docker Shell, Jupyter Kernel) to let an agent execute automatically generated code
CrewAI is a developer-friendly multi-agent framework. It represents multi-agent systems as “crews” of specialized agents and its abstractions reflect notions applicable to human teams, such as tasks, processes, collaboration, management, planning, delegation, etc. CrewAI notably offers the following features:
- Definition of the agents and their interactions in a simple yet flexible manner
- Management of the memory of the agents with four options (short-term memory, long-term memory, entity memory, contextual memory)
- Compatibility with LangChain LLMs, LangChain tools, and LlamaIndex tools
Conclusion
Although the use of LLM-powered agents is new in real-life use cases, there are already solid options to start implementing agent applications. In this blog post, we presented four agent frameworks with distinct strengths. At this stage, our preliminary generic recommendations would be:
- Use LangGraph when the need to define the execution logic in a very fine-grained manner is paramount (for example, in the case of complex human-in-the-loop interactions).
- Use LlamaIndex for data-augmented LLM applications (e.g., question answering applications), especially when speed of development and maintainability trump flexibility.
- For tasks that are naturally decomposed in several, well delineated sub-tasks, use CrewAI or AutoGen depending on the desired balance between speed of development and precise control.
Whatever framework is chosen and given the LLMs currently available, agents will in any case remain brittle and will need to be carefully designed, tested, and monitored. As illustrated in the Dataiku LLM Starter Kit, Dataiku can support the development of agent applications through various features:
- LangChain integration which ensures all LLMs accessible through Dataiku can be used with LangGraph, LlamaIndex, and CrewAI while a custom LLM can be created for AutoGen
- Function calling support
- LLM evaluation (Early Adopter Program)
- Programmatic access to the various objects in Dataiku through the Python API, which facilitates the creation of custom tools
- Easy creation and operation of web user interfaces
- Experiment tracking with MLFlow.