Hadoopインストール

さくらのVPS 512(Debian)にHadoopを擬似分散モードでインストールしてみた。

hadoopユーザ作成

groupadd -g 2000 hadoop
useradd -u 2000 -g hadoop -d /home/hadoop -m -s /bin/bash hadoop

Javaインストール

cd /usr/local/src
wget -c http://download.oracle.com/otn-pub/java/jdk/7u2-b13/jdk-7u2-linux-x64.tar.gz
tar zxvf jdk-7u2-linux-x64.tar.gz
mv -i jdk1.7.0_02 /usr/local/jdk-1.7
ln -s /usr/local/jdk-1.7 /usr/local/java
/usr/local/java/bin/java -version

Hadoopインストール

cd /usr/local/src
wget -c http://ftp.kddilabs.jp/infosystems/apache//hadoop/common/hadoop-1.0.0/hadoop-1.0.0.tar.gz
tar zxvf hadoop-1.0.0.tar.gz
mv -i hadoop-1.0.0 /usr/local/.
ln -s /usr/local/hadoop-1.0.0 /usr/local/hadoop
chown -R hadoop. /usr/local/hadoop-1.0.0

Hadoop設定

Hadoopの設定は手動で編集したが、/usr/local/hadoop/sbin/hadoop-setup-conf.shでもできるみたい。

core-site.xml

vi /usr/local/hadoop/conf/core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>
    <name>hadoop.tmp.dir</name>
    <value>/hadoop</value>
  </property>

  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:54310</value>
  </property>

</configuration>
hdfs-site.xml

vi /usr/local/hadoop/conf/hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->
 
<configuration>

  <property>
    <name>dfs.name.dir</name>
    <value>${hadoop.tmp.dir}/dfs/name</value>
  </property>

  <property>
    <name>dfs.data.dir</name>
    <value>${hadoop.tmp.dir}/dfs/data</value>
  </property>

  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>

</configuration>
mapred-site.xml

vi /usr/local/hadoop/conf/mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:54311</value>
  </property>

  <property>
    <name>mapred.cluster.local.dir</name>
    <value>${hadoop.tmp.dir}/mapred</value>
  </property>

</configuration>
hadoop-env.sh

hadoop-env.shは以下の環境変数のみコメントインして修正。
vi /usr/local/hadoop/conf/hadoop-env.sh

export JAVA_HOME=/usr/local/java
export HADOOP_LOG_DIR=/var/log/hadoop
export HADOOP_PID_DIR=/var/run/hadoop

ディレクトリ作成

mkdir -p /hadoop
mkdir -p /var/run/hadoop
mkdir -p /var/log/hadoop
chmod 777 /hadoop
chown -R hadoop. /hadoop
chown -R hadoop. /var/run/hadoop
chown -R hadoop. /var/log/hadoop

SSH鍵作成

su - hadoop
ssh-keygen -t rsa -P ""
cat .ssh/id_rsa.pub >> .ssh/authorized_keys
chmod 600 .ssh/authorized_keys

NameNodeフォーマット

/usr/local/hadoop/bin/hadoop namenode -format

Hadoop起動

/usr/local/hadoop/bin/start-all.sh
/usr/local/java/bin/jps
xxx SecondaryNameNode
xxx NameNode
xxx Jps
xxx JobTracker
xxx DataNode
xxx TaskTracker