IT Automation through Ansible

Ansible is an IT automation tool. It can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates.

One of the main reason why I choose Ansible over other IT automation tools like Puppet, Chef etc. – It manages machines in an agentless manner. As OpenSSH is one of the most peer reviewed open source components, the security exposure of using the tool is greatly reduced. Ansible is decentralized – it relies on your existing OS credentials to control access to remote machines; if needed it can easily connect with Kerberos, LDAP, and other centralized authentication management systems.

Today we are going to start with Ansible. This guide is basically built for begineers and novice who want to dirty their hands with Ansible. Let’s begin.

Setting up Ansible Server and Client:

We have three machines. I assume that there is already ansible running on machine1. We are going to install packages and few configuration changes on DB(machine2) under this post. I will discuss Web part on next post.

Here are the machine details:

Machine1> –

Machine2> – ( DB)

Machine3> –

As shown below, create two folders – db and web under /etc/ansible/roles.


The handlers , tasks and templates are recommended directory structure for ansible to place respective YAML files.

Don’t worry about the contents as of now. We just need empty directory and file structure.

Let’s start creating a first file called playbook.yml:

root@ansible-host:/etc/ansible# cat playbook.xml—# Common role playbook- hosts: alltasks: []

– hosts:

sudo: yes


– db

As shown above, playbook.yml sits under /etc/ansible and contains list of ansible machines where the deployment has to be done. A small typo above – the second .194 machine is actually 196.

Under each hosts, we need to mention roles name so that the specific role is called upon while the deployment phase is initiated on each hosts.

Let’s talk about DB contents first:

Folder: handlers

root@ansible-host:/etc/ansible/roles/db# cat handlers/main.yml—- name: start mysqlservice: name=mysql state=started- name: restart mysqlservice: name=mysql state=restarted
root@ansible-host:/etc/ansible/roles/db/tasks# cat install.yml—- name: Install mysqlapt: name={{ item }} state=latestwith_items:- mysql-server

– python-mysqldb

– php5-mysql

– libapache2-mod-auth-mysql

notify: start mysql

root@ansible-host:/etc/ansible/roles/db/tasks# cat mysql_secure_installation.yml—- name: create mysql root passcommand: /usr/bin/openssl rand -base64 16register: mysql_root_passwd

– name: update mysql root passwd

mysql_user: name=root host={{ item }} password={{ mysql_root_passwd.stdout }}


– “{{ ansible_hostname }}”


– ::1

– localhost

– name: copy user my.cnf file with root passwd credentials

template: src=dotmy.cnf.j2 dest=/root/.my.cnf owner=root group=root mode=0600

– name: delete anonymous mysql user

mysql_user: name=”” state=absent

– name: remove mysql test database

mysql_db: name=test state=absent

– name: create database blog

mysql_db: name=blog state=present

– name: create database user with name ‘blog’ and password ‘blog’ with all DB privileges and with GRANT options

mysql_user: name=blog password=blog priv=*.*:ALL,GRANT state=present

root@ansible-host:/etc/ansible/roles/db/tasks# cat main.yml—- include: install.yml- include: mysql_secure_installation.yml
cat dotmy.cnf.j2[client]user=rootpassword={{ mysql_root_passwd.stdout }}root@ansible-host:/etc/ansible/roles/db/templates# cat my.cnf.j2[client]


password={{ mysql_root_passwd.stdout }}

That’s all for DB to function well.

Hence, we are ready to execute the commands on remote machine through ansible.

root@ansible-host:/etc/ansible# ansible-playbook playbook.yml

PLAY [all] ********************************************************************

GATHERING FACTS ***************************************************************

ok: []

PLAY [] **********************************************************

GATHERING FACTS ***************************************************************
ok: []

TASK: [db | Install mysql] ****************************************************
changed: [] => (item=mysql-server,python-mysqldb,php5-mysql,libapache2-mod-auth-mysql)

TASK: [db | create mysql root pass] *******************************************
changed: []

TASK: [db | update mysql root passwd] *****************************************
changed: [] => (item=ansible-client)
changed: [] => (item=
changed: [] => (item=::1)
changed: [] => (item=localhost)

TASK: [db | copy user my.cnf file with root passwd credentials] ***************
changed: []

TASK: [db | delete anonymous mysql user] **************************************
ok: []

TASK: [db | remove mysql test database] ***************************************
ok: []

TASK: [db | create database blog] *********************************************
changed: []

TASK: [db | create database user with name ‘blog’ and password ‘blog’ with all DB privileges and with GRANT options] ***
changed: []

NOTIFIED: [db | start mysql] **************************************************
ok: []

PLAY [] **********************************************************

GATHERING FACTS ***************************************************************
ok: []

TASK: [web | Install web] *****************************************************
changed: [] => (item=python-pip,python-mysqldb)

TASK: [web | install tornado and torndb] ************************************** : ok=13 changed=7 unreachable=0 failed=0

Done. Now you can easily SSH and check on the remote DB machine that MySQL is successfully installed.

Isn’t the magic? You didn’t need any agent running on the remote machine. Just OpenSSH enables the IT automation.

Catch you in my next post.


It was an Openstack Day…OSI 2014

Yesterday I attended Open Source India 2014 event which happened in NIMHANS Convention Center, Bengaluru. If you are new to OSI, let me brief you that the Open Source India is around 11 years old and believe me..its Asia’s largest convention on open source technology. Founded by EFI group, the main motto of this organization is to bridge the gap between industry and the open source community. This event is  a step forward for bringing some successful open source implementations  in the form of keynotes, discussions, workshops.


I reached out to the venue at around 9:30 AM. I was pretty aware of the event schedule.The two-day event featured tracks on topics including Web App Development, Mobile App Development, IT Infrastructure Day, Cloud Day, Kernel Day, Database Day, FOSS For Everyone, IT Implementation Success Stories and Open Stack Mini Conference.

HP was the platinum partner while Microsoft, MongoDB, Wipro, Oracle, Zimbra were the other vendors who were ready in their booth to welcome you with their open source offerings. I took the first 20 minutes visiting each booths just to have a glimpse before I entered into Hall-1 for the morning keynote.


Rajeev Pandey, an HP Distinguished Technologist, started the keynote on “A Deployment Architecture for OpenStack in the Enterprise”. He talked about HP Helion Cloud and its offering but with the disclaimer that its open source event and what open source offering HP has.


“FOSS Adoption in four classes of institutions in India” was next topic of discussion.  The speaker shared very interesting survey on the open source adoption by research organization, Higher education, government and IT-SME. The survey is public and published under

The session titled “Free & Open Source Enterprise Linux” by Kamal Dodeja, Global Sales consulting Manager, Oracle India was well presented and very informative. Oracle speaker talked about the contribution towards XFS, MySQL, Virtualbox, OpenJDK, Xen, Java, .Net, dTrace, Eclipse, Metro, InnoDB and Glass Fish.


I raised a question regarding the latest inclusion of MariaDB by RHEL 7 replacing MySQL and the speaker looks convincing on this as he said that if you are MySQL user looking for support, you come to us…and if you want to play around with code, then explore through MariaDB. It was interesting to see that even if MySQL core is still open source, the Openstack recently in their icehouse recommends MariaDB rather than MySQL. This was very interactive session and good to know how Oracle has still preserved their open source offering after Sun acquisition.


It was a tea break and lunch too. I skipped a session on Wikipedia as I want to make myself ready for post Lunch event. The HP Helion sales team knows how to sell their product and had tech challenges in place.Soon I left for next event “OpenStack Mini Conf”.

I bagged HP Helion Jacket answering one of the query related to “Rackspace”. It was a great feeling altogether and a good interactive session. It was followed by “State of the Doplhins & the Penguins” by Sanjay Manwani, MySQL, India Driector and Ramesh Srinivasan, Senior Director, Oracle Linux and Virtualization. This session covered a complete history of Open Source movement by Oracle till date, timeline with their FOSS contributions.

“OpenStack Development and Contribution Workflow” by Swapnil Kulkarni was the next topic which I was eagerly waiting for. The speaker talked about step by step implementation of Openstack and how to contribute to the openstack through git. Though it was complete demonstration, the small font of Linux commands was something which put the audience to boredom. However, I appreciate his knowledge and subject matter expertise. I answered to couple of questions related to github during this session.

Skipped for tea in between, as I was eagerly waiting for next big session” Open stack Nova Deep Dive and Nova Instance Management lifecycle”. This was truly a great session. Anil Bidari is a trainer at Cloudenabled and very well presented about “Behind the scene while you fork any instance”. He referred to as one of his openstack demo he has put for everyone. I am eagerly waiting for his recorded presentation. Simply awesome presentation !!!

“Ironic” – something very new to me, was next topic from HP Helion Group. Ironic is a bare metal provisioning tool which has been recently incubated into Icehouse edition. The presentation was very descriptive. Ironic architecture was presented well. I had couple of questions as it was very similar to Puppet Razor and it was answered convincingly.


It was 6:00 PM and the event was soon to be wrapped up for the next day. But still I had an enough energy to listen for more speakers. The last but not the least was something I was keen to attend – “Docker as hypervisor driver for OpenStack Nova Compute”.

Docker is an open platform for developers and system administrators to build, ship and run distributed application. This is something which is coming up and could be threat to virtual world. To explain in simple words, if you have Fedora VM  you use user libraries/binaries + kernel component. While we say Fedora Docker, it means you just need user libraries(and not kernel components). It is a concept very closely related to user namespace.

Overall, the event was very informative. I met with couple of college students, Cloud experts and interacted with vendors.


Installing Skype on CentOS 6.5

Getting skype working on CentOS 6.5 has been a daunting job for lot of system administrators. Usually I found system administrators posting this query in facebook and other Linux groups. This lead me to try out Skype on one of the available CentOS 6.5 machine. This guide should work for RHEL and latest Fedora versions too.

Important Note published by CentOS Team:

“…Starting with Aug 4th 2014, no version of Skype older than 4.3 works due to the changes that were implemented in the authentication mechanism. Any attempt to use older versions lead to an error message similar to “Cannot contact server” ( the exact message varies depending on the version that was used )…”

1. Login to CentOS 6.5 machine


2. Update the EPEL repository:


3. Ensure that the latest epel release is installed as shown below:


4. Download the skype 4.3 from skype website and unzip it under /usr/src folder:


5. Rename the unzip to skype:


6. Create the below file:


7. Modify the permission of the file as instructed:


8. Provide the appropriate links:


9. Ensure that these 32 bit packages are installed:( Please note that the skype installation on CentOS and Ubuntu OS ususally fails due to 32 bit packages compatibility.


10. Skype is ready now to be run .  You can run skype commandline directly or open it through the GUI if you are not using it through putty:


Still finding difficulty in getting the skype working, post your equiry at


Automating Oracle Weblogic Server installation through shell script

Automation always saves your considerable time. Especially when you have to follow the similar step for hundreds of machine, automated scripts and tools have always been a great weapon for system administrators.Today I spent considerable time to setup Weblogic Server 10.3.6 on my CentOS 7.0 machine through shell script. This unattended script uses autoexpect rather than silent.xml or WLST as suggested by Oracle. Let me share the steps I followed to deploy the Weblogic Server:

Ensure you have the following software in place downloaded from Oracle Website.We can’t use wget for this as it requires Oracle Login for downloading these pieces of software.

Links for Software Downloads:

a. Download jdk-7u67-linux-i586.gz from
b. Download from

Once you download the above software, create a directory called /softwaretmp/ and download the above software into this directory:

#mkdir /softwaretmp/

#cd /softwaretmp/


2. Create an empty file called and paste the following shell script (shown below):

echo “Checking if WebLogic Server is already runing. If its running, stopping it and reinstalling it from scratch”
pkill -9 java
pkill -9 Weblogic
rm -fr /u01/oracle/wlsdomains/base_domain
rm -fr /u01/oracle/fmw/wlserver_10.3
rm -fr /u01/jdk/jdk7
pkill -9 java
echo “Initializing the Installation”
groupadd orainstall
useradd -g orainstall oracle
mkdir -p /u01/jdk
cd /u01/jdk
tar -zxvf /softwaretmp/jdk-7u67-linux-i586.gz
ln -s jdk1.7.0_67 jdk7
mkdir -p /u01/oracle/fmw/wlserver_10.3
cp -rf /softwaretmp/ /u01/oracle/fmw/wlserver_10.3/
echo ”
MW_HOME=/u01/oracle/fmw/wlserver_10.3; export MW_HOME
JAVA_HOME=/u01/jdk/jdk7; export JAVA_HOME
” >> ~/.bash_profile
cd ~
. ./.bash_profile
cd /u01/oracle/fmw/wlserver_10.3
. $MW_HOME/wlserver/server/bin/
mkdir -p /u01/oracle/wlsdomains
cp -rf /softwaretmp/script.exp /u01/oracle/fmw/wlserver_10.3/wlserver/common/bin/
cd /u01/oracle/fmw/wlserver_10.3/wlserver/common/bin
echo ” Expect Script is well run”
cd /u01/oracle/wlsdomains/base_domain/
mkdir -p servers/AdminServer/security
mkdir -p servers/managedserver_1/security
cd servers/AdminServer/security
” >>
cd /u01/oracle/wlsdomains/base_domain/servers/managedserver_1/security
” >>
echo ” Weblogic Server 10.3.6 Configuration is all done !!”
echo “Starting the Weblogic Server”
cd /softwaretmp/

3. You need to add two more scripts to the /softwaretmp directory:

Create a empty file called and paste the below content:

#!/usr/bin/expect -f
cd /u01/oracle/wlsdomains/base_domain/bin/
spawn ./
expect “Enter username to boot WebLogic server: ”
send “weblogic\r”
expect “$ ”
expect “Enter password to boot WebLogic server: ”
send “Oracle9ias\r”
expect “$ ”
send “exit\r”

Also, create another empty file called and paste the below lines:

#!/usr/bin/expect -f
cd /u01/oracle/wlsdomains/base_domain/bin/
spawn ./ managedserver_1
expect “Enter username to boot WebLogic server: ”
send “weblogic\r”
expect “$ ”
expect “Enter password to boot WebLogic server: ”
send “Oracle9ias\r”
expect “$ ”
send “exit\r”

Save the file.

4. Now comes the important step. You need to create script.exp file through series of steps. You can always create script.exp through the following steps:

a. Ensure that the autoexpect software through the following command:

#yum install autoexpect

b. Start the autoexpect tool through the following command:

#autoexpect -s

It will output that autoexpect has already initiated and will be saved under script.exp

c. Now run the following command under /u01/oracle/fmw/wlserver_10.3/wlserver/common/bin

#cd /u01/oracle/fmw/wlserver_10.3/wlserver/common/bin


Follow the general steps for selecting the right options as per your infrastructure.

Once completed, ensure you run the following command:


autoexpect stopped.

Once you have completed the above steps, a script.exp gets created which has to be copied to /softwaretmp directory.

Still finding difficulty? Post your questions at


How to integrate Redmine with Git?

Redmine built through Ruby on Rails has been impressive free and open source web-based project management. I have been Trac quite for some time and find Redmine very similar.

One of my company colleague was finding difficulty integrating Redmine with Git. I decided to help him from the scratch and it went flawless.

I had a VMware Workstation running on my Inspiron. I installed CentOS 6.3 but it should work for CentOS 6.2 too. I followed these below steps:

Installing Pre-requisite Packages

# yum install openssl-devel zlib-devel gcc gcc-c++ automake autoconf readline-devel curl-devel expat-devel gettext-devel patch mysql mysql-server mysql-devel httpd httpd-devel apr-devel apr-util-devel libtool apr

Install Ruby on Rails

Download Ruby source code from


# tar xzvf ruby-1.9.3-p194.tar.gz# cd ruby-1.9.3-p194

# ./configure –enable-shared

# make

# make install

# ruby –v

ruby 1.8.7 (2009-12-24 patchlevel 248) [x86_64-linux], MBARI 0x6770, Ruby Enterprise Edition 2010.01

Install Ruby Gems


# tar xvzf rubygems-1.8.24.tgz

# cd rubygems-1.8.24

# ruby setup.rb

# gem -v


# gem install rubygems-update# update_rubygems

Install Rake

# gem install rake

Install Rails

# gem install rails

Install Passenger

#gem install passenger

Install Redmine

Download the redmine software through

cd /usr/local/share

# wget

# tar xzvf redmine-2.0.3.tar.gz

# cd redmine-2.0.3

# ln -s /usr/local/share/redmine-2.0.3/public /home/code/public_html/redmine

Configure MySQL

Create the Redmine MySQL database:

# mysql -u root -p

mysql> create database redmine character set utf8;

mysql> create user ‘redmine’@’localhost’ identified by ‘Pass123’;mysql> grant all privileges on redmine.* to ‘redmine’@’Pass123’;

Configure database.yml:

# cd /usr/local/share/redmine-2.0.3

# vi config/database.yml

production: adapter: mysql database: redmine host: localhost username: redmine password: my_password

Generate a session store secret:

# gem install -v=0.4.2 i18n

# gem install -v=0.8.3 rake

# gem install -v=1.0.1 rack

# rake generate_session_store

While you run the last command(shown above) you might encounter error messages related to rmagick

We can skip ImageMagick completely and execute the following command:

# bundle install –without development test rmagick

Setup permission:

# chown -R apache:apache /usr/local/share/redmine-2.0.3

# find /usr/local/share/redmine-2.0.3 -type d -exec chmod 755 {} \;

# find /usr/local/share/redmine-2.0.3 -type f -exec chmod 644 {} \;

Configuring Virtual Host

Your apache configuration for virtualhost should look like this:

ServerName codebinder.comServerAlias www.codebinder.comRailsBaseURI /RailsEnv productionDocumentRoot /home/code/public_html/redmine/public<Directory /home/code/public_html/redmine/public>

Options -MultiViews


Open and you will be able to see the redmine page successfully.

Installing Git

# yum install git git-core

Redmine User Guide

Open and you will see this page:

For the first time, admin/admin are the credentials.

You will find redmine default page. Click on Administration.

Click on Settings option.

Choose Repositories.

If you have subversion or Darcs or Mercurial or CVS or Git, you will find path enabled.

Since Git is installed on linux machine, the path /usr/bin/git will get displayed.

How to Create a New Project?

Click on New Project.

Let’s create a new project as shown below:

Click on Create and Continue once you entered all the needed entries.

You will be able to see foo project information by clicking on overview tab.

We are going to import git repository for this project.

Under Settings > Repositories, click on New repository as shown above slide.

A Contractor can create a repository called “foo” as shown below:

So we have a git repository created at /home/code/gitrepos/foo/.git which we need to include in the redmine page as shown below.

Hence, you see that Git repository has been created and integrated successfully with Redmine.

Hope I have put all the steps very clearly.


Running Hadoop on Ubuntu 14.04 ( Multi-Node Cluster)

This is an introductory post on Hadoop for new begineers who want step by step instruction for deploying Hadoop on the latest Ubuntu 14.04 box.Hadoop allows for the distribution processing of large data sets across clusters of computers. It uses Map Reduce programming model. It is designed to scale up from single servers to 1000 of machines, each offering local computation and storage.

HDFS is the distributed file system that is available with Hadoop.MapReduce tasks use HDFS to read and write data.HDFS deployment includes a single Name Node and multiple Data Nodes.  In this section, we will setup a Name Node and multiple Data Nodes.

Hadoop Architecture Design:

Machine IP Type of Node Hostname Master Node Data Node 1 Data Node 2

Let’s talk about YARN..

In a simple language, YARN is basically a Hadoop Next Generation Map Reduce called Map Reduce v2.In short, it is a cluster management technology. YARN combines a central resource manager that reconciles the way applications use Hadoop system resources with node manager agents that monitor the processing operations of individual cluster nodes.

The fundamental idea of YARN is to split up the two major functionalities of the Job Tracker, resource management and job scheduling/monitoring, into separate daemons. YARN split up the two major responsibilities of the Job Tracker/Task Tracker into separate entities:

  • a global Resource Manager
  • a per-application Application Master
  • a per-node slave Node Manager
  • a per-application Container running on a Node Manager

Putting together, the YARN component can be visualized as shown below:


What are the pre-requisites:

1. Install 3 number of VMs of Ubuntu 14.01.1 on Virtual box. While installing ensure that OpenSSH server package is selected which configures SSH service automatically.


Ensure that the Bridge Adapter option is configured (as shown below). This ensures that all the nodes can communicate with each other.

Setting up Master Node

1. Login to as normal user through putty. As you see below, the master node has IP address Ensure that the full FQDN name is provided for this host.


As shown above, I logged in as user1 which was created by default during the installation time.We are soon going to create a user and group for Hadoop.


2. Open /etc/hosts file through vi editor and add the following entries:


As shown above, you need to add the hostname and IP Address of each nodes so that they can identify and ping each other through hostname and IP address both.

Setting up User and Group for Hadoop

3. Let’s create a user for Hadoop. First you need to create a group called hadoop and add a new user called hduser to the newly created hadoop group as shown below:


4. Ensure that the newly created hadoop user is added to the sudo user(shown below):

The above step is an important step and shouldn’t be skipped.

Enabling Password-less SSH

5.Make sure that hduser can SSH to its own account without password.


  1. For the first time, try SSH to localhost running ssh hduser@localhost.It will ask for password so as to add this host in the list of known hosts. Run the exit command and try to SSH again. This time it shouldn’t ask for password (as shown below).

As shown above, the hduser can SSH to its own account without any password.

 Disabling IPv6

 [OPTIONAL] It is always recommended to disable the IPv6 since the system is going to use for different Hadoop configuration. Follow the below steps to disable IPv6 on the master node.


8. Reboot the machine to let the system update with the kernel parameters correctly.


Remember that you might skip the above IPv6 under certain conditions where you have just testing environment.

9.Re-SSH to the master node through putty again.

Configuring JAVA

1. Download the JDK from shown below)


I downloaded jdk-7u71-linux-i586.tar.gz as per my machine architecture. If you are running x86_64 architecture machine, you will need to download x64.rpm from the same link.

11.Create a directory called java under /usr/local through mkdir utility.


12.Upload the Oracle JDK binaries into java directory of the Ubuntu machine through WinSCP or other whichever available.

13. Unpack the compressed JDK software as shown below:


Once unzipped, you will see the following listing of files.


14.Copy the Oracle JDK unzipped binaries into /usr/local/java directory as shown below:


Verify that all the binaries are copied.


15. Setup the environmental variable for JAVA_HOME. Open /etc/profile through nano or VI editor and add the following lines at the end of the file.

16. Save the file.

17. Run the following command to point out to the correct Oracle JDK location.


18. For JDK to be available for use, run the following command.


19. It is very important to run the below command to reload your system wide PATH under /etc/profile.


20. You can also verify if JAVA_HOME is working or not.


NOTE: We need to configure JAVA the same way we followed above for all the nodes.

21. Before configuring Hadoop, we need to make data node 1 and data node 2 ready. Let’s configure them too.

Setting up Data Node 1

1. Login to one of the data node say, as a normal user through putty. As you see below, this machine has IP address of


As shown above, I logged in as user2 which was created by default during the installation time. We are soon going to create user and group for Hadoop.

2. Open /etc/hosts file through vi editor and add the following entries:


NOTE: Follow the above step on Data Node 2 too.

3. As similar as we created hduser and hadoop group for masternode, follow the same steps for data node 1 and data node 2 too.


Ensure you don’t miss the below step for allowing sudo access for the hduser.


4. This is an important part of data node configuration. We are going to configure passwordless SSH so that masternode can SSH to all datanodes without password.

Note: Run the below step on Master Node only.


Try logging to the slave node from master node without password as shown below:


5. Follow step 10 to 20 discussed above for configuring JAVA_HOME on this node too.

Setting up Data Node 2:

1. Login to data node 2 as shown below:


2. Configure /etc/hosts as similar as what we configured for data node 1.


3. Configure User and group for hadoop.


4. Again, this is an important step which IS TO BE RUN ON MASTER ONLY.


The above command lets passwordless SSH from master to the data node 2.

5.Follow step 10 to 20 discussed above for configuring JAVA_HOME on this node too. Once you configure on both the data node you might see something like shown below:


Configuring Hadoop: 

NOTE: The below commands to be run on all the master and data nodes.

22. Download Hadoop binaries from  Run the wget utility(shown below) to download Hadoop binaries from remote Hadoop website.


23. Unzip the hadoop binaries as shown below:


Once you run the above command it will extract the binaries and place it under the same location. You need to copy it to /usr/local directory.

24. Unzip the hadoop tar directly into /usr/local/hadoop-2.3.0 folder :


25. Create a symbolic link for hadoop directory under /usr/local/hadoop as shown below:


26. Provide the ownership to hduser and hadoop group to execute the hadoop binaries:


27. Switch to hduser through the following command:

28.Open .bashrc placed under home directory of hduser and add the following entries at the end of the file:


29. Save the file. Run the command called bash to let the environment variable effective as shown below:


30. Edit the following for letting hadoop know where does JAVA_HOME resides. Once the entry is done, you should be able to able to see the following results.


31.Verify that the hadoop installation through the following command:


This shows that Hadoop is properly configured.

32.As the above Hadoop configuration is run on all the nodes, ensure that /usr/local/hadoop is the path where Hadoop resides on all the nodes. Follow the same steps for all the data nodes too. For example, if you follow the steps above on data node 1 you should expect the following results:


Now we have master node and data node ready with the basic Hadoop installation.

PLEASE NOTE: In a newer version of hadoop, there is one slight additional steps for environmental variable setup for JAVA_HOME to work. Open the file under /usr/local/hadoop/libexec and make the following entry too.

Configuring Master Node:

33. Let us first configure Master Node File configurations.

34.Ensure that you are logged in as hduser and running the below commands.


Create required files as shown below on master node.

5. Open the hdfs.xml file and add the following entries:

36. Open the file $HADOOP_INSTALL/etc/hadoop/core-site.xml and let hadoop module know where master node(name node) resides.


Put the entries only inside the configuration tab and not outside.

37.Format the HDFS filesystem on the master node as shown below:


It takes few seconds and the final results gets displayed:


38. Edit the file $HADOOP_INSTALL/etc/hadoop/slaves with all the data nodes entries into master node.


Configuring Data Nodes

39. Login to one of the data node(say datanode1) and create the following files:


Once you make the entry , the file should look like as shown below:


41. Let the data node know where does master node (namenode) resides through editing the core-site.xml file.


Instead of IP address above, you can use hostname of master node if you have correct entries under /etc/hosts.

42. Now open the master node session and run the below command:

As shown below it tries to ssh to data node and start the respective services in the data nodes.


Ensure that the required services (name node, data node and YARN ) are running through jps command:

43. One can verify


44. Ensure that the required services are running at the data node end too.


Also, verify on data node 2 as shown below:


45. You can access Hadoop Name Node details under http://<masternode>:50070.

Under this link, you can access Datanodes , logs for each namenodes, snapshot of events and overall DFS details too.




As shown above, there are two Data Node represented as Live Nodes.

Click on Data Node section on the top of the Web URL to see the data node status

Under this link, you can access Datanodes , logs for each namenodes, snapshot of events and overall DFS details too.


You can visualize the secondary namenode through the following URL:


You can see the datanode 1 status under the URL:


In the similar manner, you can see the data node 2 status through URL:


Before I wrap up..

We will now look into basic HDFS shell commands which forms the basis for running Map Reduce jobs. Hadoop file system usually referred as fs shell commands are used to perform various file operations. For Example: copy, changing permissions, viewing the file contents, changing ownership of files, creating directories and much more. You can see the various options available through the below command:


Listing the size of DFS:


Creating a directory in HDFS. This is very similar to unix command.


Listing the HDFS file system:


Copy a file from your local system to HDFS file system:


As shown above, first create a empty file called alpha in some local folder. Add some contents to it through editor. Use –copyFromLocal option to copy it from local file system to HDFS file system.

Copy a file from HDFS to local system:


As shown above, first delete the alpha file from your local machine. Now try running –copyToLocal option to copy it from HDFS to your local machine.

Displaying the length of the file contained in a directory:


Displaying the stat information for a HDFS path:


You can always refer to HDFS man pages for detailed options for command line utilities for operation on HDFS.

In our next episode, we will cover the various ecosystem of Hadoop.