Running Hadoop over Lustre

This blog post describes how to run Apache Hadoop over the open source distributed parallel file system like Lustre.

Hadoop is a large-scale distributed, open source framework for parallel processing of data on large cluster built of commodity hardware. While it can be used on a single machine, its true power lies in its ability to scale to hundreds or thousands of computers, each with several processor cores. Hadoop is also designed to efficiently distribute large amounts of work across a set of machines.

Hadoop is built on two main parts. A special file system called Hadoop Distributed File System (HDFS) and the MapReduce Framework. HDFS is an optimized file system for distributed processing of very large datasets on commodity hardware. HDFS stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. Both MapReduce and the Hadoop Distributed File System are designed so that node failures are automatically handled by the framework.

Hadoop runs on master-slave architecture. HDFS cluster consists of a single Namenode, a master server that manages the file system namespace and regulates access to files by clients. There are a number of DataNodes usually one per node in a cluster. The DataNodes manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. A file is split into one or more blocks and set of blocks are stored in DataNodes. DataNodes serves read, write requests, and performs block creation, deletion, and replication upon instruction from Namenode.
On the other hand, Lustre is an open-source distributed parallel filesystem.Lustre is a scalable, secure, robust, and highly-available cluster file system that addresses the I/O needs, such as low latency and extreme performance, of large computing clusters. Lustre is basically an object-based file system. It is composed of three functional components: Metadata servers (MDSs), object storage servers (OSSs), and clients.MDS (metadata servers) provides metadata services MDS store file system metadata such as file names, directories, and permissions. An availability of the Meta Data Server is critical for file system data. In a typical configuration, there are two MDSs configured for high-availability failover. Since an MDS stores only metadata, the storage (MDT) attached to the file system need only store hundreds of gigabytes for a multi-terabyte file system. One MDS per file system manages one metadata target (MDT). Each MDT stores file metadata, such as file names, directory structures, and access permissions.

Why Lustre and not HDFS?

Hadoop distributed File System fits well for general statistics application but it might show performance bottleneck for computational complexity applications. For example, High Performance computing based application which generates large even increasing outputs. Secondly, HDFS is not POSIX complaint, which means it cannot be used as a normal file system which makes it difficult to extend. Also, HDFS has a WORM (write-once, read-many) access model, changing a small part of a file requires that all file data be copied, resulting in very time-intensive file modifications. Hadoop implements a computational paradigm named Map/Reduce where Reduce node uses HTTP to shuffle all related big Map Task outputs before real task begins. This will surely consume mass resources and generate lot of I/O and merge spill operation.

Setting up Hadoop over Lustre

Image-1-Lustre

Image-2-Lustre

Setting up Metadata Server

I assume that you have CentOS/RHEL 6.x installed on your hardware. I have RHEL 6.x available and will be using it for this demonstration. This should work for CentOS 6.x versions too. Firewall and SELinux both need to be disabled. You can disable firewall through iptables command whereas SELinux can be disabled through the below command:

Lustre-1

Install the lustre related packages through YUM repository.

lustre-2

Reboot the machine to boot into Lustre kernel. Once the machine is up, verify the lustre kernel through the following command:

lustre-3

Create a LVM partition on /dev/sda2 (ensure this is a LVM partition type at your end)

lustre-4

Run mkfs.lustre utility to create a file system named lustre on the server including the metadata target and the management server as shown below:

lustre-0

Now, create a mount point and start MDS node as shown below:

lustre-01

Run the mount command to verify the overall setting:

lustre-7

Ensure the MDS related services are enabled:

[root@MDS ~] # modprobe lnet
[root@MDS ~] # lctl network up
[root@MDS ~] # lctl list_nids
192.168.1.185@tcp

Hence, this completes MDS configuration.

Setting up Object Storage Server (OSS1)

Object Storage Server need to be setup on a separate machine. I assume lustre is installed on a separate box and booted into lustre kernel. As we did earlier for MDS, we need to create several logical volumes namely ost1 to ost6 as shown below:

lustre-8

Use the mkfs.lustre command to create the lustre file systems as shown below:
[root@oss1-0 ~] # mkfs.lustre –fsname lustre –ost –mgsnode=192.168.1.185@tcp0 /dev/vg00/ost1

lustre-9

Run the above command for all the ost1-ost6 in similar way.
Verify the various logical volumes created as shown below:

#mount –t lustre /dev/vg00/ost1 /mnt/ost1
mkfs.lustre –fsname lustre –ost –mgsnode=192.168.1.185@tcp0 /dev/vg00/ost2
mkfs.lustre –fsname lustre –ost –mgsnode=192.168.1.185@tcp0 /dev/vg00/ost3
mkfs.lustre –fsname lustre –ost –mgsnode=192.168.1.185@tcp0 /dev/vg00/ost4
mkfs.lustre –fsname lustre –ost –mgsnode=192.168.1.185@tcp0 /dev/vg00/ost5
mkfs.lustre –fsname lustre –ost –mgsnode=192.168.1.185@tcp0 /dev/vg00/ost6

It’s time to start the OSS by mounting the OSTs to the corresponding mount point.

Finally the mount command displays the following logical volumes.

Verify the relative devices displays as shown:

This completes OSS1 configuration.
Follow the similar step for OSS2 (shown above). It is always recommended to perform striping over all the OSTs run this command on lustreclient:

#lfs setstripe -c -1 /mnt/lustre

Setting up Lustre Client #1
All client mounts to the same file system identified by the MDS. Use the following commands, specifying the IP address of the MDS server:
#mount –t lustre 192.168.1.185@tcp0:/lustre /mnt/lustre

You can use lfs utility to manage the entire files system information at the client system.(as shown below).

It shows that the overall file system size of /mnt/lustre is around 70GB. Striping of data is an important aspect of the scalability and performance of Lustre File System. The data gets stripped over the blocks of multiple OSTs. The stripe count can be set on a file system, directory or file level.
You can view the stripping details through the below command:

[root@lustreclient1 ~] # lfs getstripe /mnt/lustre
/mnt/lustre
Stripe_count: 1 stripe size: 1048576 stripe_offset: -1

Hence we have Lustre setup ready. Now, it’s time to run Hadoop over lustre instead of HDFS.
I assume that Hadoop is already running with 1 name node and 4 data nodes.
On the master node, let’s perform the following file configuration changes.

File: /usr/local/hadoop/conf/core-site.xml

core-site1

File: /usr/local/hadoop/conf/mapred-site.xml

mapred-1

On every slave nodes (data nodes),, let’s perform the following configuration changes

File: /usr/local/hadoop/conf/core-site.xml

core-site2

File: /usr/local/hadoop/conf/mapred-site.xml

mapred-2

We are good to start the hadoop related services.
On master node, run the mapred service without starting HDFS (since we are going to use lustre only) as shown below:

mapred_3

You can ensure the service running through the jps utility:

mapred-4

Start the tasktracker on the slave nodes through the following command:

mapred-5

You can just run a simple hadoop word count example (as shown below):

mapred-Last

Hence, we have successfully run Hadoop runs over Lustre.

Reference:
http://hadoop.apache.org/docs/r1.1.1/mapred_tutorial.html
http://wiki.lustre.org/index.php/Configuring_the_Lustre_File_System
http://wiki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf

15 thoughts on “Running Hadoop over Lustre

  1. Thanks for your personal marvelous posting!

    I certainly enjoyed reading it, you are a great author.I will ensure that I bookmark your
    blog and may come back in the foreseeable future.
    I want to encourage you to continue your
    great job, have a nice evening!

  2. Spot on with this write-up, I really believe this amazing site
    needs much more attention. I’ll probably be
    back again to see more, thanks for the information!

  3. hello admin, do you know? very nice blog and I am following this blog every day. Your author is also very high quality indeed. very nice article. My web site, and I strive to provide quality service for your blog. Would you like to look at my web site admin?

  4. Oh my goodness! an amazing article dude. Thank you Nonetheless I am experiencing difficulty with ur rss . Don’t know why Unable to subscribe to it. Is there anyone getting similar rss problem? Anybody who is aware of kindly respond. Thnkx

  5. Good day! I just wish to give a huge thumbs up for the great data you’ve got right here on this post. I can be coming again to your blog for more soon.

  6. I’ve recently started a web site, the information you offer on this web site has helped me tremendously. Thank you for all of your time & work.

  7. Hey exceptional blog! Does running a blog similar to this require a massive amount work? I have absolutely no expertise in programming but I was hoping to start my own blog soon. Anyway, if you have any recommendations or techniques for new blog owners please share. I know this is off subject however I simply had to ask. Cheers!|

  8. Very nice post. I just stumbled upon your blog and wished to say that I have really enjoyed surfing around your blog posts. After all I’ll be subscribing to your feed and I hope you write again soon!

  9. Reading your site gave me a lot of interesting informations ,
    it deserves to go viral, you need some initial traffic only.
    How to get initial traffic??? Search for: masitsu’s effective method

  10. This excellent website definitely has all of the information I needed about this subject and didn’t know who to ask. |

  11. Good site! I really love how it is easy on my eyes and the data are well written. I am wondering how I could be notified when a new post has been made. I have subscribed to your feed which must do the trick! Have a nice day!

  12. Pretty component to content. I just stumbled upon your web site and in accession capital to
    assert that I acquire actually loved account your blog
    posts. Anyway I will be subscribing to your augment and even I success you get right of entry to consistently quickly.

  13. Woah! I’m really enjoying the template/theme of this blog.
    It’s simple, yet effective. A lot of times it’s challenging to get that “perfect balance” between user friendliness
    and visual appearance. I must say you have done a superb job with this.
    Also, the blog loads super fast for me on Opera. Outstanding Blog!

  14. Cool blog! Is your theme custom made or did you download it from somewhere?

    A theme like yours with a few simple adjustements would really make my blog jump out.

    Please let me know where you got your design. Thank you

Leave a Reply

Your email address will not be published. Required fields are marked *