Adesoji Alu Follow Adesoji brings a proven ability to apply machine learning(ML) and data science techniques to solve real-world problems. He has experience working with a variety of cloud platforms, including AWS, Azure, and Google Cloud Platform. He has a strong skills in software engineering, data science, and machine learning. He is passionate about using technology to make a positive impact on the world.

How to Build a Conversational Agent with OpenAI Realtime API

12th November 2024 3 min read

Table of Contents

Imagine having a seamless, real-time conversation with an AI agent in your web application—no database setup, no additional infrastructure complexities.

This blog introduces a project that leverages OpenAI’s Realtime API to build a conversational agent with JavaScript (frontend) and Python FastAPI (backend). It provides a plug-and-play solution for organizations to integrate into their existing tech ecosystem, solving complex real-time interaction challenges effortlessly.

The Problem

Modern organizations face the challenge of creating real-time communication systems that are scalable, efficient, and easy to integrate. Many solutions require extensive infrastructure, including databases and elaborate setups for handling session states and conversational data. This adds complexity and slows down adoption. The code discussed in this blog eliminates those challenges.

How This Solution Works

This project uses OpenAI’s Realtime API to manage AI conversations in real-time. The backend, built with FastAPI, handles WebSocket communication between the front end and OpenAI’s API.

Redis is used for lightweight, temporary storage, allowing for efficient state management without the need for a full-fledged database. The frontend, built in React, facilitates seamless interaction
with the user, offering text and voice communication capabilities.

Features:

Voice and Text Communication
Real-time AI responses
Session management using WebSockets
Support for audio transcription and custom AI instructions
Plug-and-play configuration for multiple organizations
Downloadable conversation history

How It Solves Realtime Problems

This solution tackles real-time challenges like latency and scalability. By leveraging WebSocket communication, the system provides near-instantaneous responses, making it suitable for applications
like customer support, virtual assistants, and real-time decision-making systems. It’s simple to integrate, allowing organizations to focus on enhancing user experiences rather than dealing with technical bottlenecks.

Why It’s Useful for Organizations:

Eliminates the need for complex database setups
Scalable to handle multiple users and organizations
Easy to configure and integrate into existing tech solutions
Supports audio and text modalities for versatile use cases

Code Walkthrough

Backend: Python with FastAPI

The backend, written in Python using FastAPI, acts as a bridge between the frontend and OpenAI’s API. It manages sessions, handles user input, and processes AI responses in real-time.


import os
import json
import asyncio
from fastapi import FastAPI, WebSocket
from fastapi.middleware.cors import CORSMiddleware
from websockets.client import connect

app = FastAPI()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["http://localhost:5173"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

@app.websocket("/gpt-api/chat_stream/{organization}/{request_id}")
async def chat_stream(websocket: WebSocket, organization: str, request_id: str):
    await websocket.accept()
    async with connect("wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01") as openai_ws:
        # Logic for handling communication with OpenAI's API
        pass

The backend uses WebSocket routes to manage real-time conversations, enabling seamless interaction with OpenAI’s Realtime API.
For the complete backend code, visit this repository.

Frontend: React with JavaScript

The React frontend allows users to interact with the AI via a chat interface. It supports both text and voice inputs, providing a rich, user-friendly experience.


// App.jsx
import React, { useState, useEffect } from 'react';

function App() {
  const [messages, setMessages] = useState([]);
  const ws = new WebSocket("ws://localhost:8000/gpt-api/chat_stream/organization1/request-id");

  ws.onmessage = (event) => {
    const message = JSON.parse(event.data);
    setMessages([...messages, { sender: "AI", text: message.text }]);
  };

  const sendMessage = (text) => {
    ws.send(JSON.stringify({ text }));
    setMessages([...messages, { sender: "You", text }]);
  };

  return (
    <div>
      <h1>Chat with AI</h1>
      <div>
        {messages.map((msg, idx) => (
          <p key={idx}><strong>{msg.sender}:</strong> {msg.text}</p>
        ))}
      </div>
      <textarea onBlur={(e) => sendMessage(e.target.value)} />
    </div>
  );
}

export default App;

The frontend is designed to dynamically display AI responses and allows users to send audio or text messages. For the complete frontend code, check the GitHub repository.

How to Get Started

Clone the repository: git clone https://github.com/Adesoji1/OpenaiRealtime-API.git
Follow the Instructions in the readme.md , Install dependencies for both frontend and backend. The backend used python 3.11.6 environment while the frontend use node version v20.12.2 and npm version 10.5.0
Set up your OpenAI API key in an environment file or directly in the code having obtained the Openai key i your Openai Dashboard, monitor usage too.
Run the backend server with FastAPI uvicorn main:app --host 0.0.0.0 --port 8000 and the frontend with React in the frontend-directorynpm run dev.
Access the application at http://localhost:5173.
View the application logs for the redis and uvicorn server in the images above and below.

References

Learn more about OpenAI’s Realtime API from their
official documentation.
Find the full source code on
GitHub.

Conclusion

This project demonstrates the potential of leveraging OpenAI’s Realtime API for building scalable and efficient conversational agents. With minimal setup and simple architecture, it offers a powerful solution for organizations looking to integrate AI-powered real-time interactions into their platforms. Also, note that introducing a cache could be an efficient method to prevent your API calls from hitting the rate limits. The current OpenAI Realtime implementation only supports javascript but we at Collabnix were able to develop a solution using Python in Backend.

Have Queries? Join https://launchpass.com/collabnix

#OpenAI

« Containerize Rust Application in 2 Minutes using Docker Init

Exploring Singapore’s Tech Landscape at GovTech STACK Conference 2024 »