Large language models (LLMs) are a type of artificial intelligence (AI) that are trained on massive datasets of text and code. They can be used for a variety of tasks, such as generating text, translating languages, and writing different kinds of creative content.
A large language model (LLM) is a deep learning algorithm that can perform a variety of natural language processing (NLP) tasks. LLMs use transformer models and are trained using massive datasets — hence, large. This enables them to recognize, translate, predict, or generate text or other content.
What are the benefits of using LLMs?
LLMs offer a number of benefits over traditional NLP techniques, including:
- They can handle more complex tasks, such as machine translation and question answering.
- They are more accurate than traditional techniques.
- They can be used to generate more creative and informative text.
- They can be adapted to new tasks more easily than traditional techniques.
What are the challenges of using LLMs?
LLMs also have some challenges, including:
- They require a lot of data to train.
- They can be computationally expensive to train and deploy.
- They can be biased, reflecting the biases in the data they are trained on.
- They can be used to generate harmful or offensive content.
How are LLMs being used today?
LLMs are being used in a variety of ways today, including:
- Chatbots: LLMs can be used to create chatbots that can have natural conversations with humans.
- Question-answering systems: LLMs can be used to build question-answering systems that can answer questions posed in natural language.
- Natural language generation systems: LLMs can be used to build natural language generation systems that can generate text, translate languages, and write different kinds of creative content.
- Code generation systems: LLMs can be used to build code generation systems that can generate code from natural language descriptions.
- Data analysis systems: LLMs can be used to build data analysis systems that can extract insights from data.
Introducing Serge
Serge is an open-source chat platform for LLMs that makes it easy to self-host and experiment with LLMs locally. It is fully dockerized, so you can easily containerize your LLM app and deploy it to any environment.
This blog post will walk you through the steps on how to containerize an LLM app with Serge.
Prerequisites
To follow this tutorial, you will need the following:
- A computer with Docker installed
- The Serge source code
- A pre-trained LLM model
Note:
Make sure you have enough disk space and available RAM to run the modes. 7B requires about 4.5 GB of free RAM, 13GB requires about 12GB free, 30B requires about 20GB free.
Step 1: Create a new directory for your app
First, create a new directory for your app.
mkdir my-app
cd my-app
Step 2: Clone the Serge repository
Next, clone the Serge repository into your app directory.
git clone https://github.com/serge-chat/serge.git
Step 3: Create a Dockerfile
Now, you need to create a Dockerfile for your app. The Dockerfile is a text file that tells Docker how to build your app image.
In your app directory, create a new file called Dockerfile.
nano Dockerfile
Paste the following code into the Dockerfile:
FROM serge-chat/serge:latest
COPY my-model.pkl /app/
CMD ["python", "app.py"]
This Dockerfile tells Docker to use the latest version of the Serge image as the base image. It then copies the pre-trained LLM model to the /app directory and runs the app.py script when the image is run.
Step 4: Build the Docker image
Once you have created the Dockerfile, you can build the Docker image for your app.
docker build -t my-app .
This will create a Docker image called my-app.
Step 5: Run the Docker image
Finally, you can run the Docker image for your app.
docker run -it my-app
This will start a containerized instance of your LLM app. You can then connect to the app using a web browser.
Step 6. Using Docker Compose
services:
serge:
image: ghcr.io/serge-chat/serge:latest
container_name: serge
restart: unless-stopped
ports:
- 8008:8008
volumes:
- weights:/usr/src/app/weights
- datadb:/data/db/
volumes:
weights:
datadb:
Then, just visit http://localhost:8008/, You can find the API documentation at http://localhost:8008/api/doc
How to Deploy Serge on Kubernetes
You can deploy Serge using the manifests below, it contains the required kind to make it run on a Kubernetes cluster.
Use this deployment manifest for your setup:
---
apiVersion: v1
kind: Service
metadata:
labels:
app: serge
name: serge
namespace: serge-ai
spec:
ports:
- name: "8008"
port: 8008
targetPort: 8008
- name: "9124"
port: 9124
targetPort: 9124
selector:
app: serge
status:
loadBalancer: {}
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: serge
name: serge
namespace: serge-ai
spec:
replicas: 1
selector:
matchLabels:
app: serge
template:
metadata:
labels:
app: serge
spec:
containers:
- image: ghcr.io/serge-chat/serge:latest
name: serge
ports:
- containerPort: 8008
- containerPort: 9124
resources:
requests:
cpu: 5000m
memory: 5120Mi
limits:
cpu: 8000m
memory: 8192Mi
volumeMounts:
- mountPath: /data/db
name: datadb
- mountPath: /usr/src/app/weights
name: weights
restartPolicy: Always
volumes:
- name: datadb
persistentVolumeClaim:
claimName: datadb
- name: weights
persistentVolumeClaim:
claimName: weights
status: {}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
labels:
app: serge
name: weights
namespace: serge-ai
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 64Gi
status: {}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
labels:
app: serge
name: datadb
namespace: serge-ai
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 16Gi
status: {}
---
You can now deploy Serge with the following commands:
$ kubectl create ns serge-ai
$ kubectl apply -f manifest.yaml
You can add the supported Alpaca models using the following commands after gathering the Pod ID:
$ kubectl get pod -n serge-ai
NAME READY STATUS RESTARTS AGE
serge-58959fb6b7-px76v 1/1 Running 0 8m42s
$ kubectl exec -it serge-58959fb6b7-px76v -n serge-ai python3 /usr/src/app/api/utils/download.py tokenizer 7B
If you have an IngressClass on your cluster, it is possible to use Serge behind an ingress. Below is an example with an Nginx IngressClass:
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: serge-ingress
namespace: serge-ai
annotations:
nginx.ingress.kubernetes.io/configuration-snippet: |
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection upgrade;
proxy_set_header Accept-Encoding gzip;
nginx.org/websocket-services: serge
nginx.ingress.kubernetes.io/cors-allow-methods: "PUT, GET, POST, OPTIONS, DELETE"
spec:
ingressClassName: nginx
tls:
- hosts:
- MY-DOMAIN.COM >>> **EDIT HERE**
secretName: serge-tls
rules:
- host: MY-DOMAIN.COM >>> **EDIT HERE**
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: serge
port:
number: 8008
---
If you have Cert-Manager installed, you can make a TLS certificate with the following YAML file:
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: serge-tls
namespace: serge-ai
spec:
secretName: serge-tls
issuerRef:
name: acme-issuer
kind: ClusterIssuer
dnsNames:
- 'MY-DOMAIN.COM' >>> **EDIT HERE**
privateKey:
algorithm: RSA
encoding: PKCS1
size: 4096
---
Conclusion
This blog post has shown you how to containerize a large language model app with Serge. By following these steps, you can easily deploy your LLM app to any environment.