How to Effectively Monitor and Manage Cron Jobs

Table of Contents

Cron jobs are an essential weapon for any developer who needs to automate tasks on a regular basis. From cleaning up log files to sending emails and running backups, cron jobs allow you to schedule tasks to run automatically at specific intervals, freeing up your time for more important tasks.

However, while cron jobs are a powerful tool, they can also be tricky to manage. With so many tasks running in the background, it can be difficult to keep track of what’s happening and to troubleshoot any issues that arise.

In this article, we’ll explore some of the most common use cases of cron jobs, best practices for scheduling cron jobs and different ways to monitor and manage cron jobs.

Common use cases of cron jobs

1. Taking DB backups

If you are running your own instance of the db server, and not using a self-managed service like RDS, then it becomes critical that you regularly take db backups of your database and store the backup in a different location, like an S3 bucket. As it is something that you would need to do daily or even more than that depending on your use case, you will definitely need to automate that and one way to do this is using cron jobs. You can write a bash script and then schedule a cron job to run daily which will trigger the backup and then push to S3 or your desired location.

2. Rotating log files

This is a use case that can seem not so critical but have been the cause of server failures far too many times for teams with highly skilled engineers. Log rotation basically means taking a backup of your current log file and creating a new log file. Consider you have a web application and the app server used is Puma, then for every request to your app, it would log the request along with other meta data to the puma.log file. If you don’t monitor the size of the log file, one day, it will account for taking up maximum disk size and you would start getting errors like “No disk space available” and eventually the app and the server will crash bringing downtime for your customers. Hence log rotation becomes critical to over all health of your server.

This is a mundane task that needs to be done at least once a day and a classic use case of cron job.

3. Generating invoices or running other business logic

Lets say you need to generate invoices on 1st of every month and this is something that you want to automate it. You can put the logic of generating invoices in a file and then use the cron job to run that file 1st of every month. And same can be applied to any other business logic that you might have.

How to effectively monitor and manage cron jobs

Looking at the above use cases, we can understand that it is very important that these cron jobs run at the time they are scheduled for and run without raising any exceptions. Hence their monitoring becomes really important and the reasons they can fail for. One of the biggest reason a cron job fails is due to lack of best practises followed while scheduling these jobs.

Lets look at some of the best practises which can make a developer life a little easier:

Best practices while scheduling cron jobs

1. Ability to test jobs before scheduling them

Lot of times developers tend to be in a hurry and takes simple things for granted. For instance, a developer might want to run this command rake invoices::quaterly_generate once a month. This command will generate invoices for clients who are at the quarterly billing cycle and will trigger a mail to those clients with the generated invoice as an attachment.

Now for a developer to do a test run, it would mean that it would also send an email to real customers for a test run. It will definitely not go well with the customer. So the developer should have triggers in place to send mails only when trigger specifically. When a job is designed in a way to keep testing also in mind, such scheduled jobs have relatively low chances of failure.

2. Follow the Principle of Least Privilege (PoLP)

It is a security practise when followed it gives a specific user or set of users access to specific data, applications, utilities and more. Today lot of companies are building their product on micro services based architecture and lots of time, the resources (server / pod / container) are shared across different services which are owner by different teams. So a lot of people end up having access to the instance without full knowledge of the other apps running on same instance. This kind of access can lead to issues in the application specially when these background jobs are at a core to smooth operation of the business.

3. Logging the output the right way

A crontab will have entries to run lots of jobs. By default the output of all those jobs would get logged at /var/log/syslog , but it would be incredibly hard to get information about a specific job. A better approach would be configure a different path for each job based on their criticality. For example, from our previous example of generating invoices, it might be important to log the output of the job in a seperate file which would make it easier to debug it further.

Configuring this is super simple, it is just a matter of thinking through all the possible use cases. It can be configured like this or you can even specify a separate file to log the errors in.


01 14 * * * rake invoices::quaterly_generate >> /home/log/invoices.log 2>&1
01 14 * * * rake invoices::quaterly_generate >> /home/log/invoices.log 2> /home/log/invoices_errors.log

Alternatively, you can also use the MAILTO option in the crontab to send the log by default to the mail id mentioned. Although, from a better debugging perspective, I would still recommend to use the loggers.

4. Forever running jobs

There can be scenarios where a job does not end and will wait forever to get the response until the server restarts or someone manually kills the job. Such jobs can eat up resources of your server and thus affecting other core business application running on the same server. There are 3 ways to avoid this from happening:

—> First, the command which is being scheduled to run, it has an internal timeout. So if the job does not complete in N seconds, the job itself will time out and it will lead to resources not being hogged up by endless jobs.

—> Second, the timeout of the crontab is utilised. Otherwise it will run the task indefinitely. Here is how you can use the implementation of timeout in the coreutils like this:

0 1 * * * timeout 10s cronjob

—> Using Dr. Droid helps you connect independent events in real-time and get alerts on missing events. You can use their SDK to publish events at the start of the cron job and the completion of it. Then setup the monitor to get alerts for individual or aggregate failures of your cron jobs failing to complete and generate metrics on the time taken for the jobs.

Different ways to monitor cron jobs

1. Manually checking the status of cron jobs

I call this the “smash the hammer” way. It means that you would login to your server and then verify if the cron job actually ran or not. For example, you want to verify log rotation, you would login to the server and then go to the path and check if the newly created rotated log file exists or not. You might also go ahead and check the size of the application.log file to verify if it looks smaller than the last time you saw it.

Now imagine your application is running on 10 different instances, it would mean manually login into each server and doing this daily and for all the cron jobs you have scheduled.

This approach will get the job done, but it will take up so much time of the engineer and chances of error creeping in goes really high and chances of that engineer looking for a different job even higher.

2. Manually checking the output logged to a file

You can argue that one of the best practises we mentioned was logging the output to a file. The use of the log file is not for monitoring but for debugging when you see there are some issues.

The one problem with cron job that I see is that it fails silently. So it becomes ever more important to log the failures which can be used for debugging later.

But this is a manual approach and would not work in today’s world.

3. Heartbeat based solutions

This means sending a signal to an app whose sole job is to listen for that signal and notify if it does not receive the signal in certain time range. The notification can be in the form of SMS, phone call, alert on a dashboard etc. There are 2 ways to go about this.

First, you can build this service by yourself, which might be cheaper in the short term, but it would also mean, that you are introducing another point of failure in your design and another app that your engineering team will have to maintain. Second, you can use one of the SaaS provider offering a similar solution like cronitor. This makes your life easier in the long run.

4. Using Doctor Droid

Dr Droid provides a stateful approach to monitor the health of your cron jobs. Using their platform, you can setup alerts for cronjobs that take too long or don’t complete as expected. You can setup monitor to get alerts for cron jobs failing to complete as well as for them taking longer than usual. They are easy to use and have 1 Million events / month free so for most regular use-cases, you should be covered.

Conclusion

In conclusion, cron jobs can be a powerful tool for automating repetitive tasks, but they can also create problems if not managed effectively. By following best practices for monitoring and managing cron jobs, developers can ensure that their applications continue to run smoothly and efficiently.