Joris Bontje

Automated Export of Cloudera Manager Configuration for Hadoop

Joris Bontje

Cloudera Manager is a web based management application for your Apache Hadoop cluster. It makes the installation and configuration for your Hadoop cluster a whole lot easier and is free for a cluster up to 50 nodes. In particular I like the suggested configuration settings based on your cluster hardware.

All the configuration settings of Cloudera Manager are persisted in the configuration database, which can be manually exported through the admin interface. One of our clients wanted to export these settings programmatically for auditing and backup purposes.

Currently there isn’t an automated way to do that, besides backing up the entire database. Here is a little shell script that allows you to download the configuration automatically in text format.

 Read more

Sentiment Analysis using Apache Hive

Joris Bontje

Apache Hive is a data warehouse system built on top of Hadoop. Using SQL-like language you can query data stored in the Hadoop filesystem (HDFS). Those queries are then translated into Map Reduce jobs and executed on your cluster.

As an example we’ll analyze tweets from the Twitter Streaming logs and calculate the top 5 hashtags per day which are associated with positive sentiment signals (smileys).

You can imagine how this can be expand this to simple sentiment analysis on your (potential) customer feedback.

 Read more