There are several ways you can install Hadoop and Hive. An easy way to install a complete Hadoop system, including Hive, is to download a preconfigured virtual machine (VM) that runs in VMWare1 or VirtualBox2.
1. http://vmware.com.
2. https://www.virtualbox.org/
http://doc.mapr.com/display/MapR/Quick+Start+-+Test+Drive+MapR+on+a+Virtual+Machine
While using a preconfigured virtual machine may be an easy way to run Hive, installing Hadoop and Hive yourself will give you valuable insights into how these tools work, especially if you are a developer.
The instructions that follow describe the minimum necessary Hadoop and Hive installation steps for your personal Linux.
Installing Java
Hive requires Hadoop and Hadoop requires Java. Ensure your system has a recent
v1.6.X or v1.7.X JVM (Java Virtual Machine). Although the JRE (Java Runtime Environment) is all you need to run Hive, you will need the full JDK (Java Development Kit). you’ll need to ensure that Java is in your path and the JAVA_HOME environment variable is set.
Linux-specific Java steps
On Linux systems, the following instructions set up a bash file in the /etc/profile.d/ directory that defines JAVA_HOME for all users. For this you will require “root” access.
Note: $ as the bash shell prompt.
$ /usr/java/latest/bin/java -version
Output
java version “1.6.0_23”
Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
Java HotSpot(TM) 64-Bit Server VM (build 19.0-b09, mixed mode)
$ sudo echo “export JAVA_HOME=/usr/java/latest” > /etc/profile.d/java.sh
$ sudo echo “PATH=$PATH:$JAVA_HOME/bin” >> /etc/profile.d/java.sh
$ . /etc/profile
$ echo $JAVA_HOME
Output:
/usr/java/latest
However, if you don’t want to make permanent changes that affect all users of the system, an alternative is to put the definitions shown for
PATH and JAVA_HOME in your $HOME/.bashrc file:
export JAVA_HOME=/usr/java/latest
export PATH=$PATH:$JAVA_HOME/bin
Installing Hadoop
Hive runs on top of Hadoop. To install Hadoop on a Linux system, run the following commands.
$ wget http://www.us.apache.org/dist/hadoop/common/hadoop-0.20.2/hadoop-0.20.2.tar.gz
$ tar -xzf hadoop-0.20.2.tar.gz
$ sudo echo “export HADOOP_HOME=$PWD/hadoop-0.20.2” > /etc/profile.d/hadoop.sh
$ sudo echo “PATH=$PATH:$HADOOP_HOME/bin” >> /etc/profile.d/hadoop.sh
$ . /etc/profile
Default mode of Hadoop is “local Mode”. For developers working on personal machines is the fact that local mode doesn’t closely resemble the behavior of a real cluster, which is very important to remember we testing the applications. To address this we need, a single machine can be configured to run in pseudodistributed mode, where the behavior is identical to distributed mode, namely filesystem references default to the distributed filesystem and jobs are managed by the JobTracker service, but there is just a single machine and singlenode “cluster.” Because Hive uses Hadoop jobs for most of its work, its behavior reflects the Hadoop mode you’re using. However, even when running in distributed mode, Hive can decide on a per-query basis whether or not it can perform the query using just local mode, where it reads the data files and manages the MapReduce tasks itself, providing faster turnaround. Hence don’t worry, the distinction between the different modes is more of an execution style for Hive than a deployment style, as it is for Hadoop.
If instead you get an error message that hadoop isn’t found, either invoke the command with the full path (e.g., $HOME/hadoop-0.20.2/bin/hadoop) or add the bin directory to your PATH variable.
Installing Hive
Installing Hive is similar to installing Hadoop. We will download and extract a tarball for Hive, which does not include an embedded version of Hadoop. Hive uses the environment variable HADOOP_HOME to locate the Hadoop JARs and configuration files. So, make sure you set that variable as discussed above before proceeding.
curl -o http://archive.apache.org/dist/hive/hive-0.9.0/hive-0.9.0-bin.tar.gz
tar -xzf hive-0.9.0.tar.gz
sudo mkdir -p /user/hive/warehouse
sudo chmod a+rwx /user/hive/warehouse
Define HIVE_HOME variable if you required:
$ sudo echo “export HIVE_HOME=$PWD/hive-0.9.0” > /etc/profile.d/hive.sh
$ sudo echo “PATH=$PATH:$HIVE_HOME/bin >> /etc/profile.d/hive.sh
$ . /etc/profile
The core of a Hive binary distribution contains three parts. The main part is the Java code itself. Multiple JAR (Java archive) files such as hive-exec*.jar and hive-meta store*.jar are found under the $HIVE_HOME/lib directory. Each JAR file implements a particular subset of Hive’s functionality. The $HIVE_HOME/bin directory contains executable scripts that launch various Hive services, including the hive command-line interface (CLI). The CLI is the most popular way to use Hive.