Thursday 12 September 2013

Using Apache Whirr & Amazon EC2

So once you are done with basic stuff of installing hadoop locally and doing some "Hello World" stuff (wordcount in hadoop parlance), now it is time to put hadoop into some real usage to understand it's value. A good infrastructure is required to do some heavylifting with fairly large amount of data. I did go through some of the cloud providers and figured out that Amazon EC2 works out cheaper. I was also promised to get 50$ AWS credit for participating in RedHat survey.

There are plenty of posts available as to how to setup hadoop on EC2. I followed this one. I also figured out that installing hadoop into cloud is fairly cumbersome - if you are doing it by hand. Apache Whirr an installer tool for cloud services could help get things going pretty fast.

So using Apache whirr and Amazon EC2, I was able to setup my first hadoop cluster and was able to run some word count map-reds.