This blog was originally contributed by a Collabnix community member Mewantha Bandara for the KubeLabs GitHub repository.
Kubernetes is highly distributed, where each pod runs a separate application. This in itself is not a problem. Even if you have hundreds of pods running a hundred different applications, Filebeat is more than capable of handling all those open log files constantly written into and passing them on to Logstash. Logstash then manages the in-flow of logs to make sure elasticsearch isn’t overwhelmed. The problems start appearing if you have a sudden surge in the number of pods. This is not normal when it comes to everyday Kubernetes use cases. However, if you were using autoscaling jobs that massively scaled up and down depending on the workload, this could happen. One good example is KEDA.
KEDA looks at whatever metric you specify and massively scales jobs (basically pods) up and down to handle the demand. If each of these jobs writes a log to a common data source (such as EFS or other network file system), you could potentially have hundreds of open log files that are concurrently being written into. At this point, a single instance of Filebeat may be unable to keep track of all these log files and either end up skipping some logs or stop pushing logs entirely. The solution for this is to either have multiple replicas of Filebeat or to launch Filebeat as a sidecar container for each pod that comes up.
In this blog post, we will be discussing how we can launch Filebeat as a sidecar container, and the various pitfalls to look out for when doing this. But first, you need to know what a sidecar container is. The concept is explained in detail in the Pods101 section, but in summary, it is simply a case where you run two containers inside a single pod. Generally, this will consist of 1 main container that runs your application, and 1 support container that runs another application designed to support the first application. In this case, this support application will be the Filebeat container that pushes the logs created by the main container to Elasticsearch. Note that while we term the containers as “main” and “sidecar”, Kubernetes itself does not make this distinction. This is not an issue if you were running a regular application in a pod that runs forever since you want both the application as well as the filebeat sidecar to run continuously. However, if you were running a job instead of a pod, you want the job to terminate and disappear after the job finishes. This would be a problem since Filebeat would continue to run after the job finishes, meaning that the job will hang around forever. We will be looking at a workaround to mitigate this.
Let’s start by defining a Kubernetes job. If you want a deep dive into Kubernetes jobs, take a look at the Jobs101 section. We will be using the same example used there and adding the filebeat sidecar onto it. To keep things simple, we will use the non-parallel-job.yml. Deploying this file to your cluster will create a job that starts, sleeps for 20 seconds, then succeeds and leaves. We will be editing the YAML to add the Filebeat sidecar:
apiVersion: batch/v1 kind: Job metadata: name: wait spec: template: metadata: name: wait spec: containers: - name: wait image: ubuntu volumeMounts: - name: data mountPath: /data/ command: ["sleep", "20"] - name: filebeat-sidecar image: elastic/filebeat:5.6.16 volumeMounts: - name: data mountPath: /data/ command: ["/bin/sh", "-c"] args: ["/usr/share/filebeat/filebeat -e -c /usr/share/filebeat/filebeat.yml & while [ ! -f /data/completion-flag ]; do sleep 1; done && exit 0"] restartPolicy: Never
This is the chunk that was added:
- name: filebeat-sidecar image: elastic/filebeat:5.6.16 volumeMounts: - name: data mountPath: /data/ command: ["/bin/sh", "-c"] args: ["/usr/share/filebeat/filebeat -e -c /usr/share/filebeat/filebeat.yml & while [ ! -f /data/completion-flag ]; do sleep 1; done && exit 0"]
Let’s take a closer look at this chunk. We define a container called “filebeat-sidecar” and specify that the image is filebeat version 5.6.16. We also mount a few volumes, which we will get to later, and finally run the filebeat command. This command may look a little complicated, so let’s break it down. First, we have:
/usr/share/filebeat/filebeat -e -c /usr/share/filebeat/filebeat.yml
This is the actual filebeat command. By default, Filebeat is found in /usr/share/filebeat/
, which has both the Filebeat executable as well as the filebeat.yml which specifies the filebeat properties that filebeat should work based on. Since we will be using the default filebeat.yml, we will not be overriding this. However, keep in mind that to override it. you only have to specify a volume mount:
- name: filebeat-config
mountPath: /usr/share/filebeat/filebeat.yml
subPath: filebeat.yml
Next to the command to run filebeat, you will see a while loop. Let’s get into this.
As mentioned before, a job does not run infinitely. However, filebeat does. Since there is no way to distinguish between a main and sidecar container, filebeat may run forever and hold up the pod even after the main job has finished running. This is where we use a slightly unconventional means of detecting whether the main container has finished. We start by mounting a volume called “data”:
volumeMounts:
- name: data
mountPath: /data/
Notice that we mount this same path on both the main and sidecar containers. Now we have established a single mount that both containers can read/write to. Now, we will be adding steps to the yaml so that when the main container finishes running, a file called “completion-flag” will be created in the shared “data” directory. Meanwhile, from the point where the filebeat container starts, it will be checking for this file. The moment the file appears, the exit 0 command will run and the filebeat container will stop. Thereby both containers will stop simultaneously and the job will finish.
The sleep command will be modified like so:
command: ["sleep 20; touch /data/completion-flag"]
We use ;
instead of &&
so that even if the command fails, the file will be created. From the filebeat side, this is the command that runs the filebeat container:
args: ["/usr/share/filebeat/filebeat -e -c /usr/share/filebeat/filebeat.yml & while [ ! -f /data/completion-flag ]; do sleep 1; done && exit 0"]
From here, we’ve already looked at the filebeat command, so let’s take a look at the second half of this command:
while [ ! -f /data/completion-flag ]; do sleep 1; done && exit 0
This is a while loop that runs as long as there is no file called “completion-flag” present. Once the flag does show up, exit 0
will be called and the filebeat container will stop running.
Now that we have fully explored these files, let’s go ahead and perform the deployment. If you have filebeat/fluentd deployed to your cluster from the previous sections, make sure to remove it since filebeat will now come bundled with the job yaml. Then go ahead and deploy:
kubectl apply -f non-parallel-job.yml
Now let’s observe each container. Since we need to watch both pods, let’s use the kubectl describe
command:
kubectl get po
Note the name of the pod, and use it in the below command:
kubectl describe pod <POD_NAME> --watch
You should see two containers being described by this command under the Containers
section. Watch as the state of both containers goes from pending
to running
. When the container running the sleep command goes to a successful
state, the container running filebeat should immediately stop. Both pods will then go into a Terminating
state before the pod itself terminates and leaves.
This brings us to the end of this section on logging with filebeat sidecars. You can use the same concept with similar tools such as fluentd if you plan to scale up your jobs/logs massively. Just make sure that there are no bottlenecks in any other points such as logstash and elasticsearch.
We have already covered fluent bit, so you know that it is way more lightweight than either filebeat or fluentd. In fact, according to benchmarks by AWS, fluent bit uses 5 times less memory compared to fluentd, and 2 times less than filebeat. So in the case where we run hundreds of jobs at the same time, it makes a lot of sense to use a logger that pushes all your log lines with as little resource consumption as possible since we will be creating a logger instance per each pod.
In the next section, let’s take a look at how we can use Fluentbit as a sidecar container to push logs.
The concept behind the fluentbit sidecar container will be basically the same as with the filebeat sidecar. The differences will be in the fluentbit conf since that will obviously use a different syntax and the way the fluentbit container will be loaded into the pod. We will continue to use the same Ubuntu job that we were using before, and the same concept of using a completion flag to tell when the container should stop will continue to apply. We will also be using the same shared volume, and we will be using a ConfigMap to load the fluent bit conf as well. Below is the fluent bit conf that matches the filebeat config that we had in the previous section:
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentbit-configmap
data:
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
[INPUT]
Name tail
Path /data/*.log
Tag mixlog
[OUTPUT]
Name http
Match mixlog
Host logstash-logstash
Port 5044
The first five lines are already familiar to you. We then start the fluentbit config. We first have some information on the service, followed by the definition of the input. As with before, we use the tail plugin to get all the log files found in /data/ and tag them with the tag “mixlog”. We then match these tagged items in the output plugin and stream the logs into the logstash service. You will notice that while filebeat natively had an input source to logstash called “beats”, fluent bit does not. However, we can use “http” to do this instead. From the logstash side, you will have to change the input to point to use “http” instead of “beats”, but apart from that, everything should work just fine.
Now let’s look at what should be done from the Kubernetes manifest side. It will be basically the same thing as what we had with filebeat, except we will use the fluent bit image. We will also be pointing the overriding config to fluent-bit.conf which will be mounted in a shared volume, the same as the filebeat yaml. Apart from that, everything will be the same.
- name: fluent-bit-sidecar
image: cr.fluentbit.io/fluent/fluent-bit:2.2.2
volumeMounts:
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
readOnly: true
- name: shared-data
mountPath: /data/
command: ["/fluent-bit/bin/fluent-bit"]
args: ["-c", "/fluent-bit/etc/fluent-bit.conf & while [ ! -f /data/completion-flag ]; do sleep 1; done && exit 0"]
Now that we have covered both areas that need to be changed, let’s go ahead and give this a test run. First off, deploy the ConfigMap:
kubectl apply -f fluentbit-configmap.yaml
Next, apply the deployment.yaml:
kubectl apply -f non-parallel-job.yml
Now let’s observe the containers in the same way we did with the filebeat sidecars.
kubectl get po
Note the name of the pod, and use it in the below command:
kubectl describe pod <POD_NAME> --watch
You should see two containers being described by this command under the Containers
section. Watch as the state of both containers goes from pending
to running
. When the container running the sleep command goes to a successful
state, the container running fluentbit should immediately stop. Both pods will then go into a Terminating
state before the pod itself terminates and leaves.
Conclusion
This brings us to the end of the section on running fluent bit as a sidecar container. Now, you may be asking the question: if fluentbit does the same things as filebeat with a much smaller resource footprint, why use filebeat at all? The answer to this is features. For example, logstash supports the Beats protocol natively. However, it does not do this for fluentbit. Instead, you will have to use HTTP, which might mess up the output that is presented in Kibana. Larger loggers such as fluentd support in-built grok parsing which fluentbit doesn’t. Instead, you will have to push logs from fluent bit to fluentd (or logstash as we do here), which adds another resource that acts as a mediator. Since logstash also handles buffering so that elasticsearch doesn’t get overwhelmed, this isn’t a particularly terrible idea. Additionally, you might notice that fluent bit does not have tools like bash or sh, which means that if you want to look inside the fluent bit container for some reason, you won’t be able to do so.
So there is a trade-off and you will have to consider what is best for your use case.