Table of Contents

Software development is becoming increasingly complex. From managing intricate codebases to deploying applications across various platforms, developers face a multitude of challenges. OpenDevin is here to revolutionize how we approach software development. This innovative framework empowers developers to create intelligent agents that can automate and enhance various development tasks. By harnessing the power of AI, OpenDevin not only boosts efficiency but frees up developers to focus on higher-level problem-solving and innovation.

OpenDevin: The Future of AI-Driven Software Development

OpenDevin is an open-source AI platform designed to function as an autonomous software engineer. It can handle a variety of software engineering tasks, such as writing and debugging code, project management, and real-time collaboration with human developers. The primary goal of OpenDevin is to utilize AI to enhance and simplify the development process, making it more efficient and accessible for all users.

What is OpenDevin used for?

OpenDevin is a cutting-edge platform powered by AI and large language models (LLMs) for autonomous software engineers. OpenDevin agents collaborate seamlessly with human developers to create code, resolve bugs, and integrate new features. This project operates as a fully open-source initiative, empowering users to utilize and customize it according to their unique requirements and preferences. OpenDevin serves as a self-sufficient AI software engineer, equipped to handle complex engineering tasks and actively engage in software development projects.

Key Features of OpenDevin

Natural Language Understanding: One of the standout features of OpenDevin is its ability to understand and interpret natural language instructions. This enables developers to communicate their ideas and requirements in simple English, and the platform seamlessly converts these into clean code, enhancing intuitiveness and user-friendly in the development process.
Comprehensive Development Tools: comes equipped with a variety of robust tools designed to improve the software development workflow:
- Chat Interface: This feature allows for real-time communication with the AI, making it easy to resolve issues and ask for assistance.
- Command Terminal: Allows execution of commands within the AI environment, streamlining task management.
- Workflow Planner: Helps organise projects, set milestones, and optimise the development process with intelligent planning features (GitHub) (OpenDevin | OpenDevin).
Real-Time Interaction: OpenDevin facilitates real-time interaction and monitoring, creating a more dynamic development environment. Developers can view immediate results and make adjustments as needed, enhancing engagement throughout the development process (Anakin.ai).

Installation and Setup

To get started with OpenDevin, users need to meet certain prerequisites, including:

Linux, Mac OS, or Windows with WSL
Docker (version 26.0.0+ recommended)
Python (version 3.10 or higher)
NodeJS (version 14.8 or higher)

The installation process involves the following steps:

Clone the Repository: Start by cloning the OpenDevin repository from GitHub.
Set Up Docker: Prepare the Docker environment to facilitate the platform’s functioning.
Run Initialization Commands: Execute specific commands to initialize OpenDevin.

Once these steps are completed, users will be ready to leverage the powerful features of OpenDevin in their software development projects. (GitHub) (OpenDevin | OpenDevin).

⚡Getting Started

OpenDevin operates optimally with Docker version 26.0.0 or higher (Docker Desktop 4.31.0 or above). It is compatible with Linux, Mac OS, or Windows via WSL only.

To initiate OpenDevin within a Docker container, run the following commands in your terminal:

Warning

Please be aware that running this command may modify or delete files in the./workspace directory.

WORKSPACE_BASE=$(pwd)/workspace 
docker run -it \ 
--pull=always \ 
-e SANDBOX_USER_ID=$(id -u) \  
-e WORKSPACE_MOUNT_PATH=$WORKSPACE_BASE \ 
-v $WORKSPACE_BASE:/opt/workspace_base \ 
-v /var/run/docker.sock:/var/run/docker.sock \ 
-p 3000:3000 \ 
--add-host host.docker.internal:host-gateway \ 
--name opendevin-app-$(date +%Y%m%d%H%M%S) \ 
ghcr.io/opendevin/opendevin

Video Explanation

Note

By default, this command pulls the latest tag, which represents the most recent release of OpenDevin. However, there are alternative options available:

For a specific release version, use ghcr.io/opendevin/opendevin:<OpenDevin_version> (replace <OpenDevin_version> with the desired version number).
For the most up-to-date development version, use ghcr.io/opendevin/opendevin:main. This version may be (unstable!) and is recommended for testing or development purposes only.

Choose the tag that best matches your needs based on stability requirements and desired features.

You’ll find OpenDevin running at http://localhost:3000 with access to ./workspace. To have OpenDevin operate on your code, place it in ./workspace. OpenDevin will only have access to this workspace folder, ensuring that the rest of your system remains unaffected as it operates within a secure Docker sandbox.

Upon launching OpenDevin, you will need to select the appropriate Model and enter the API Key within the settings that should pop up automatically. These can be set at any time by selecting the Settings button ( ) in the UI. If the desired Model does not exist in the list, you can manually enter it into the provided text box.

🤖 LLM Backends

OpenDevin is designed to be compatible with any LLM (Large Language Model) backend. For a comprehensive list of available language model providers and the specific models they offer, please check providers .

Warning

OpenDevin will issue numerous prompts to the LLM you configure. Since most of these LLMs incur costs, it is crucial to set spending limits and monitor your usage to avoid unexpected charges.

The LLM_MODEL environment variable dictates which model is utilized in programmatic interactions. However, when using the OpenDevin UI, you’ll need to select your desired model in the settings window.

Addictionally, he following environment variables might be necessary for some LLMs:

LLM_API_KEY

LLM_BASE_URL 
LLM_EMBEDDING_MODEL 
LLM_EMBEDDING_DEPLOYMENT_NAME 
LLM_API_VERSION

We have a few guides for running OpenDevin with specific model providers:

Exploring OpenDevin: Capabilities, Implementation, and Limitations

OpenDevin is a versatile tool designed to run and manage Large Language Models (LLMs) while supporting a variety of providers. It allows developers to easily integrate powerful LLM capabilities into their applications using environment variables like LLM_API_KEY, LLM_BASE_URL, LLM_EMBEDDING_MODEL, LLM_EMBEDDING_DEPLOYMENT_NAME, and LLM_API_VERSION. Here, we’ll explore how OpenDevin works with different model providers—Ollama, Azure, and Google—highlighting their specific tasks and usage.

1. Ollama

Ollama offers an API designed for accessing and managing LLMs. Implementing OpenDevin with Ollama involves setting up the environment variables mentioned above to configure the API endpoints and deployment names specific to your use case.

The primary tasks that Ollama can manage include:

Text generation
Translation
Summarization
Code generation

This setup allows the models to be easily accessible for various applications, such as chatbots, automated content creation, and more.

2. Azure OpenAI

Azure provides strong support for LLMs through its OpenAI service, which includes models such as GPT-3 and Codex. To use OpenDevin with Azure, you need to configure the Azure-specific environment variables. The implementation process typically involves:

Setting up Azure OpenAI service: This requires creating an Azure account, establishing an OpenAI resource, and obtaining the necessary API keys.
Configuring OpenDevin: Use the designated environment variables to connect OpenDevin to the Azure OpenAI endpoints.

Azure’s LLM capabilities are extensive, covering text and code generation, language translation, and complex data summarization tasks. Furthermore, Azure provides comprehensive documentation and support, making it easier for developers to effectively integrate and utilize these services.

3 .Google Gemini/Vertex LLM

Completion

OpenDevin uses LiteLLM for completion calls. The following resources are relevant for using OpenDevin with Google’s LLMs

Gemini – Google AI Studio Configs

To use Gemini through Google AI Studio when running the OpenDevin Docker image, you’ll need to set the following environment variables using -e:

GEMINI_API_KEY="<your-google-api-key>" 
LLM_MODEL="gemini/gemini-1.5-pro"

Vertex AI – Google Cloud Platform Configs

To use Vertex AI through Google Cloud Platform when running the OpenDevin Docker image, you’ll need to set the following environment variables using -e:

GOOGLE_APPLICATION_CREDENTIALS="<json-dump-of-gcp-service-account-json>" 
VERTEXAI_PROJECT="<your-gcp-project-id>" 
VERTEXAI_LOCATION="<your-gcp-location>" 
LLM_MODEL="vertex_ai/<desired-llm-model>"

4. Google PaLM API

Google’s PaLM API is designed for a wide range of generative AI tasks. To implement OpenDevin with Google’s PaLM, follow these steps:

Creating a Google Cloud Platform account: This is the first step to access the PaLM API.
Generating API keys: These keys are essential for authentication and usage.
Installing necessary libraries: For instance, you might need the PaLM API client library and other dependencies.
Setting environment variables: Configure OpenDevin to utilize Google’s API endpoints and deployment names.

The PaLM API supports tasks such as text generation, embedding creation, data augmentation with synthetic data, and model tuning. Google’s MakerSuite further simplifies the workflow by allowing developers to prototype and iterate on their models quickly. This suite supports various programming languages and provides tools for managing and manipulating synthetic data, which is vital for improving AI model performance.

🧠 Agents and Capabilities

CodeAct Agent

Description

This agent implements the CodeAct concept (paper, tweet) that consolidates LLM agents’ actions into a unified code action space for both simplicity and performance (see paper for more details).

The core functionality of the CodeAct Agent can be summarized as follows:

At each turn, the agent can:

Converse: Engage in natural language communication with users to seek clarification, confirmation, etc.
CodeAct: Choose to perform the task by executing code

Execute any valid Linux bash command
Execute any valid Python code with an interactive Python interpreter. This is simulated through bash command, see plugin system below for more details.

Plugin System

To make the CodeAct agent more powerful with only access to bash action space, CodeAct agent leverages OpenDevin’s plugin system:

Jupyter plugin: This allows for IPython execution via bash command
SWE-agent tool plugin: A powerful bash command line tool for software development tasks introduced by swe-agent.

Demo

https://github.com/OpenDevin/OpenDevin/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac

Example of CodeActAgent with gpt-4-turbo-2024-04-09 performing a data science task (linear regression)

Actions

Action, CmdRunAction, IPythonRunCellAction, AgentEchoAction, AgentFinishAction, AgentTalkAction

Observations

CmdOutputObservation, IPythonRunCellObservation, AgentMessageObservation, UserMessageObservation

Methods

Method	Description
_init__	Initializes an agent with llm and a list of messages list[Mapping[str, str]]
step	Performs one step using the CodeAct Agent. This includes gathering info on previous steps and prompting the model to make a command to execute.

The PlannerAgent class you described appears to be part of an OpenAI-based framework or a similar AI-driven automation framework, such as OpenDevin. This agent is responsible for creating and executing long-term plans to solve complex problems by leveraging a language model (LLM). Let’s break down the key components and methods in the context of OpenDevin:

Description

The planner agent is designed to:

Create Long-Term Plans: It uses a specialized prompting strategy to develop comprehensive plans for problem-solving.
Leverage Context: It considers previous action-observation pairs, the current task, and hints based on the last action at each step.
Execute Actions: The agent is capable of performing various actions, such as running commands, reading or writing files, browsing URLs, interacting with GitHub, and more.

Actions

The agent can carry out the following actions:

NullAction: No action is taken.
CmdRunAction: Executes a command in the terminal.
BrowseURLAction: Opens and interacts with a URL.
GithubPushAction: Pushes changes to a GitHub repository.
FileReadAction: Reads a file’s content.
FileWriteAction: Writes content to a file.
AgentThinkAction: The agent performs internal reasoning or thinking.
AgentFinishAction: Marks the completion of the current task.
AgentSummarizeAction: Summarizes the current state or progress.
AddTaskAction: Adds a new task to the plan.
ModifyTaskAction: Modifies an existing task in the plan.

Observations

The agent can interpret the following observations:

Observation: A general observation.
NullObservation: No observation.
CmdOutputObservation: Output from a command execution.
FileReadObservation: Output from reading a file.
BrowserOutputObservation: Output from interacting with a browser.

Methods

init

This method initializes the planner agent with a language model (llm).

Purpose: Sets up the initial state of the agent.
Parameters:
- llm: The language model that the agent will use for generating plans and actions.

class PlannerAgent: 
def __init__(self, llm): 
self.llm = llm 
self.previous_actions = [] 
self.current_task = None 
self.hints = [] 

def step(self): 
# Check if the current step is completed 
if self._is_step_completed(): 
return AgentFinishAction() 

# Create a plan prompt based on current context 
plan_prompt = self._create_plan_prompt() 

# Send the prompt to the model for inference 
next_action = self.llm.generate(plan_prompt) 

# Add the result as the next action 
self.previous_actions.append(next_action) 

return next_action 

def _is_step_completed(self): 
# Implementation to check if the current step is completed 
pass 

def _create_plan_prompt(self): 
# Implementation to create a plan prompt based on context 
pass

step

This method manages the primary logic for executing a step in the planning process.

Purpose: Executes a single step of the agent’s planning and action process.
Functionality:
- Completion Check: Checks if the current step is completed.
- Plan Prompt Creation: If not completed, creates a plan prompt based on previous actions, the current task, and hints.
- Model Inference: Sends the prompt to the language model to generate the next action.
- Action Execution: Records the generated action in the history of previous actions and returns it.

Example Usage in OpenDevin

# Initialize the language model (e.g., OpenAI GPT) 
llm_instance = OpenAIModel() 

# Initialize the planner agent with the language model 
agent = PlannerAgent(llm_instance) 

# Define the current task and initial hints 
agent.current_task = "Set up a CI/CD pipeline" 
agent.hints.append("Start with creating a GitHub repository") 

# Execute steps in the planning process 
while True: 
action = agent.step() 
if isinstance(action, AgentFinishAction): 
break 
print(action)

In this example:

The PlannerAgent is initialized with a language model.
The current task and initial hints are set.
The agent executes steps iteratively, generating and executing actions until the task is deemed completed (i.e., AgentFinishAction is returned).

This methodologyenables the agent to autonomously plan and execute a sequence of actions aimed at achieving a long-term goal,effectively leveraging the language model for intelligent decision-making at each step.

🏛️ System Architecture Overview

This is a high-level overview of the system architecture. The system is divided into two main components:
Frontend: The frontend is responsible for managing user interactions and displaying results. It provides the user interface for developers to engage with the system, input tasks, and view outputs or generated content.
Backend: The backend handles the business logic and execution of the agents. It processes requests from the frontend, manages data storage, and coordinates the execution of various agents, including planning and action agents, to fulfill user commands and automate tasks.
Together, these components work in concert to provide a seamless experience for users, enabling efficient task execution and intelligent decision-making through the agents.

Frontend Architecture

This Overview is simplified to highlight the main components and their interactions. For a more detailed view of the backend architecture, see the Backend Architecture section below.

Backend Architecture

Disclaimer: The backend architecture is a work in progress and is subject to change. The following diagram illustrates the current architecture of the backend based on the commit that is shown in the footer of the diagram.

💿 How to use OpenDevin in OpenShift/K8S

There are various ways to deploy OpenDevin in an OpenShift or Kubernetes environment. Here, we present one example of the deployment process:

Create a Persistent Volume (PV): As a cluster administrator, you will set up a PV to map the workspace_base data and Docker directory to the pod through the worker node.
Create a Persistent Volume Claim (PVC): This step allows you to mount the previously created PVs to your POD.
Create a POD: The POD will consist of two containers: one for OpenDevin and one for the Sandbox.

Steps to follow the above example.

Note: Ensure you are logged in to the cluster first with the appropriate account for each step. PV creation requires cluster administrator!

Confirm that you have read/write permissions on the hostPath used below (i.e. /tmp/workspace)

Create the PV: Sample yaml file below can be used by a cluster admin to create the PV.

workspace-pv.yaml

apiVersion: v1 
kind: PersistentVolume 
metadata: 
name: workspace-pv 
spec: 
capacity: 
storage: 2Gi 
accessModes: 
- ReadWriteOnce 
persistentVolumeReclaimPolicy: Retain 
hostPath: 
path: /tmp/workspace

# apply yaml file 
$ oc create -f workspace-pv.yaml 
persistentvolume/workspace-pv created 
# review: 
$ oc get pv 
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE 
workspace-pv 2Gi RWO Retain Available 7m23s

docker-pv.yaml😃

apiVersion: v1 
kind: PersistentVolume 
metadata: 
name: docker-pv 
spec: 
capacity: 
storage: 2Gi 
accessModes: 
- ReadWriteOnce 
persistentVolumeReclaimPolicy: Retain 
hostPath: 
path: /var/run/docker.sock

Create a NodePort service. Sample service creation command below:

5. Connect to OpenDevin UI, configure the Agent, then test:

Challenges

Some of the challenges that may need be needed to addressed for improvement includes:

Install GIT in the Container:

One challenge is ensuring that GIT is available within the container environment. This can be resolved by building a custom image that includes GIT software and using that image during pod deployment.

Example below: “to be tested!”

Mounting a Shared Development Directory:

To mount a shared development directory (“i.e. one hosted in EC2 instance”) to the POD, you can share the development directory with the worker node using a sharing software like NFS (Network File System). After setting up NFS, you can create a Persistent Volume (PV) and a Persistent Volume Claim (PVC) as described earlier to access that directory.

Agent Functionality

It’s important to note that not all agents may be operational at this time. As of recent testing, only the CoderAgent has been tested successfully with an OpenAI API key, producing positive results.

OpenDevin CodeAct 1.0

OpenDevin CodeAct 1.0 is a newly introduced agent designed for solving coding tasks. It is built on the foundation of CodeAct (tweet summary), a framework that consolidates LLM agents’ actions into a unified code action space. The conceptual model of this framework is illustrated in the following figure.

Limitations of OpenDevin

While OpenDevin offers powerful capabilities, it also has certain limitations:

Dependency on Providers: The performance and capabilities of OpenDevin are directly tied to the underlying LLM providers. Any limitations or changes in these services can directly affect the functionality.
Complexity in Configuration: Setting up and managing environment variables across different providers can be complex and error-prone, especially for beginners.
Resource Intensive: Running LLMs can be resource-intensive, requiring significant computational power and memory, which might not be feasible for all developers or applications.
Latency and Performance: Variations in response time and overall performance can occur depending on the provider and model used, potentially impacting the effectiveness of real-time applications.

Despite its innovative features, OpenDevin comes with a few limitations:

Steep Learning Curve: Although OpenDevin is designed to simplify the coding process, it still requires a solid foundation and understanding in programming languages and technologies. Beginners may find the initial setup and usage challenging (Anakin.ai).
Installation Prerequisites: The installation process necessitates specific operating systems and software versions, which might pose a barrier for some users. Furthermore, the setup process requires a certain level of programming expertise (Anakin.ai).
Reliance on Stable Network: OpenDevin’s real-time interaction capabilities require a reliable internet connection. In areas with unstable connectivity, this could hinder the development process (Anakin.ai).
Resource Intensive: Running OpenDevin, especially in a development environment, can be resource-intensive. Users need to ensure their systems meet the necessary hardware requirements to avoid performance issues.

Conclusion

OpenDevin represents a significant advancement in integrating AI with software development. By harnessing the power of natural language processing and enabling real-time interactions, it simplifies complex tasks and fosters a collaborative development environment. However, users must be prepared to navigate its initial learning curve and installation prerequisites. As the open-source community continues to contribute and refine OpenDevin, it holds the potential to become an indispensable tool for developers worldwide.

For more detailed information and documentation, visit the OpenDevin documentation and GitHub repository (GitHub) (OpenDevin | OpenDevin) (OpenDevin | OpenDevin). Happy coding!

What is OpenDevin and what Number 1 problem does it solve for you?

OpenDevin: The Future of AI-Driven Software Development

What is OpenDevin used for?

Key Features of OpenDevin

Installation and Setup

⚡Getting Started

🤖 LLM Backends

Exploring OpenDevin: Capabilities, Implementation, and Limitations

1. Ollama

2. Azure OpenAI

3 .Google Gemini/Vertex LLM

Completion

Gemini – Google AI Studio Configs

Vertex AI – Google Cloud Platform Configs

4. Google PaLM API

🧠 Agents and Capabilities

CodeAct Agent

Description

Plugin System

Demo

Actions

Observations

Methods

Method

Description

_init__

Initializes an agent with llm and a list of messages list[Mapping[str, str]]

step

Performs one step using the CodeAct Agent. This includes gathering info on previous steps and prompting the model to make a command to execute.

Description

Actions

Observations

Methods

__init__

step

Example Usage in OpenDevin

🏛️ System Architecture Overview

Frontend Architecture

Backend Architecture

💿 How to use OpenDevin in OpenShift/K8S

Steps to follow the above example.

Challenges

OpenDevin CodeAct 1.0

Limitations of OpenDevin

Conclusion

What’s New in Claude Sonnet 4

5 Reasons to Switch from Ollama to Docker Model…

Securing the Model Context Protocol: A Comprehensive Guide

init