Its 2018 ! Let Containers Manage Your Datacenter..
Containers are changing the dynamics of modern data center. It is a growing technology that is drawing widespread attention across the Enterprise IT. One of the primary reason of the rise & adoption of containers is that it allows developers to move faster. Compared to VMs which takes minutes to stand up, containers take milliseconds and even microseconds. As organizations prioritize shipping new products and features faster, to keep up in the software-eaten world, developers are favoring technology that allows them to scale applications and deploy resources much faster than what traditional VMs on public and private clouds can support.
Docker containers bring variety of cost and performance benefits. It brings ability to run multiple applications on the same server or OS without a hypervisor which eliminates the drag of the hypervisor on system resources, which means that your workloads have a lighter footprint – the container footprint is zero, because it’s simply a boundary of permissions and resources within Linux. They fire up and decommission very rapidly compared to virtual machines – a perfect fit for the ephemeral nature of today’s short-lived workloads, which are often tied to real world events.
Docker is an open platform which make application and workloads more portable and distributed in an effective and standardized way. Combining Docker containers, micro services along with software-defined infrastructure makes the datacenter more agile and quick resource reallocation. Hence, this architecture works well to improve the datacenter operations.
Most DellEMC Server Management software offerings, as well as the entire Software Defined Infrastructure, are built upon standard implementation using RESTful architecture called Redfish. Redfish is a next generation system management standard using a data model representation inside a hypermedia RESTful interface. The data model is defined in terms of a standard, machine-readable schema, with the payload of the messages expressed in JSON and the protocol using OData v4. Since it is a hypermedia API, Redfish is capable of representing a variety of implementations by using a consistent interface. It has mechanisms for discovering and managing data center resources, handling events, and managing long-lived tasks. It is easy to implement, easy to consume and offer scalability advantages over old technologies. Redfish is a RESTful interface over HTTPS in JSON format based on ODATA v4 usable by clients, scripts, and browser-based GUIs
What is OpenUSM?
OpenUSM stands for Open Universal Systems Manager. It is basically a multi-tool product like “a Swiss army knife”. It is a suite of open source tools & scripts which purely uses containers & related tools to perform server management tasks, monitoring & insight Log Analytics. It is 100% container based solution which heavily uses Docker for building microservices for system management tasks like BIOS token change, Firmware update. It is completely an out-of-band system management solution purely based on Redfish API Interface. It is a platform agnostic solution (can be run from laptop, server or cloud) and works on any of Linux or Windows platform with Docker Engine running on top of it.
OpenUSM is a project which was created by Ajeet Singh Raina & Avinash Bendigeri, a DellEMC employee 3-months back.
OpenUSM follow an easy deployment model. It uses developer’s tools like Docker Compose & CLI to bring up microservices which ensures that system management tasks can be achieved flawlessly. It uses modern tools & technologies and integrates well with near real-time search analytics tools like ELK stack. It enables Sensor Log Analytics & visualization for operations team using the popular Grafana tool. It can scale both vertically & horizontally. As it is completely open source, you are free to build and customize based on your needs and holds a plug-and-play components and functionalities.
Technology Overview of OpenUSM:
OpenUSM is an integrated container solution which brings 3 basic functionalities– system management, monitoring and insight log analytics. System Management stack sits at the middle of the architecture which uses Python, Redfish, Django, Docker & Docker compose to enable system management tasks. On the left hand side of the architecture, we have monitoring stack which uses Prometheus, Grafana, Alert-Manager and Pushgateway to retrieve GPU and Sensor metrics. On the right hand side, it consists of open source version of Elasticsearch, Logstash & Kibana for sensors, lifecycle controller & SEL logs.
Understanding OpenUSM System Management WorkFlow
OpenUSM uses “Container-Per-Server (CPS)” model. For each server management tasks, there are scripts which when executed builds and run Docker containers against each of server platform. It purely uses Redfish API to communicate directly with Dell iDRAC, collects iDRAC/LC logs and pushes it to syslog server. Logstash collects the syslog server and pushes to elasticsearch and finally it gets visualized through Kibana. OpenUSM uses Prometheus Stack for monitoring System components like GPU/CPU monitoring using nvidia-docker & node exporter.
Under this blog post, I am going to demonstrate how OpenUSM simplifies the overall system management tasks with the help of Docker containers & Redfish. For this demonstration, I will leverage Ubuntu 17.10 VM running on my ESXi 6.0 system. This code should work on any of Linux & Windows platform too.
The source code for this project is open to the public and is available under https://github.com/openusm/openusm
Cloning the Repository
If you have Docker already installed on your system, you can skip to next step. If not, run the below command to install Docker & Docker Compose on your system.
Based on your network connectivity, this step will take 1-2 minutes to complete.
OpenUSM is 100% containerized solution and hence we will be running ELK inside Docker containers. To keep it simple, we designed a docker-compose file which can get you started in a matter of seconds.
Execute the same bootstrap file to bring up ELK stack as shown below:
Just wait for 30-40 seconds to get ELK stack up and running.
Verifying the ELK services
Run the below command to check if ELK services are up and running:
Pushing DellEMC iDRAC Logs to ELK Stack
Under this section, I am going to demonstrate how OpenUSM makes it so easy to push logs of DellEMC system management tasks to a centralized ELK stack and get it visualized via Kibana . Let us pick up a simple “BIOS Token Change” functionality and apply it for multitude of DellEMC servers.
To keep it simple, we designed a script named “bios-token.py” which is placed under the root of OpenUSM GIT repository. Let us first look at various parameters which can be supplied with bios-token.py script –
As shown above, the script is targeted both at a single Dell server as well as multitude of DellEMC servers via a plain text file. We are currently looking at Autodiscovery feature to automate this functionality. One need to provide NFS server IP, share name and BIOS token configuration files as argument to execute it successfully. Once this script is invoked, it creates as many number of Docker containers per DellEMC servers, collects iDRAC logs from each of servers and pushes it to syslog server which runs inside Docker container. Logstash collects it from syslog and dumps it into Elasticsearch to get it visualized under Kibana UI.
/var/nfsshare => NFS share
ips.txt => list of DellEMC iDRAC IP
biosconfig.xml => XML definining the BIOS tokens entry
By now, you should be able to see iDRAC logs visualized under Kibana UI. We can perform ample amount of customization around Kibana UI to display the logs per server basis.
Insight Log Analytics (LC, SEL & Sensor Logs)
Unpacking OpenUSM secrets further, this marks as an interesting use case and robust capability around insight Log Analytics. DellEMC server generates varieties of logs like system event logs (SEL), RAID controller logs & Lifecycle controller (LC) logs. When a system event occurs on a managed system, it is recorded in the System Event Log (SEL). The SEL page displays a system health indicator, a time stamp, and a description for each event logged. The same SEL entry is also available in the Lifecycle Controller (LC) log.
Considering a certain use cases where datacenter administrator want to collect LC logs for the last 1 year, it definitely requires a robust and modern tool to collect such huge data and perform analysis on top of the specific software.
We recently designed a script which simplifies such log analytics capability using Docker, Redfish & ELK. You can find the “lclogexporter.py” script under the root of GITHUB repository:
This script requires elasticsearch IP address, credentials & list of iDRAC IPs to get all iDRAC LC logs pushed to ELK stack. Whenever you execute this script, it will build a Docker image called “openusm-analytics” first and run this container which automatically pushes all LC, SEL and Sensors logs to ELK stack.
Below is the Kibana UI visualizing Pie-Chart for Dell Lifecycle Controller logs collected for the last 1 year timeframe.
Insight Log Metrics (Sensor Logs) using Grafana
Did you find OpenUSM interesting? OpenUSM is just 3 months old project and we are looking out for contributors across the globe. If you think the project really looks cool, come & join us to make it more robust. We welcome your participation..