Let's note/FreeBSDでHadoop(スタンドアロンモード)

FreeBSDにはportsHadoop-1.0が用意されていて,ノートパソコンにも簡単にインストール・実行することができます.
rootで作業します.

# cd /usr/ports/devel/hadoop/
# make
# make install

なお,なぜかは良く分かっていないのですが,マシンによってはmake installしたときに,次のようにエラーになるものがありました.

# make install
===>  Installing for hadoop-1.0.0
===>   hadoop-1.0.0 depends on file: /usr/local/bin/bash - found
===>   hadoop-1.0.0 depends on file: /usr/local/openjdk6/bin/java - found
===>   Generating temporary packing list
===>  Checking if devel/hadoop already installed
===> Creating users and/or groups.
Creating group `hadoop' with gid `955'.
Creating user `hadoop' with uid `955'.
pw: user 'hadoop' disappeared during update
*** Error code 67

ゆーざhadoopが出来ていないというのですが,確認すると

# grep hadoop /etc/passwd /etc/master.passwd 
/etc/passwd:hadoop:*:955:955:hadoop user:/nonexistent:/usr/sbin/nologin
/etc/master.passwd:hadoop:*:955:955::0:0:hadoop user:/nonexistent:/usr/sbin/nologin

# pw usershow hadoop
pw: no such user `hadoop'
# id hadoop
id: hadoop: no such user

確かにテキストファイルレベルではユーザがいるけど,DB形式に入っていないので,DB形式に登録するといけたようでした.

# pwd_mkdb /etc/master.passwd 
# pw usershow hadoop
hadoop:*:955:955::0:0:hadoop user:/nonexistent:/usr/sbin/nologin
# id hadoop
uid=955(hadoop) gid=955(hadoop) groups=955(hadoop)

# make install
===>  Installing for hadoop-1.0.0
===>   hadoop-1.0.0 depends on file: /usr/local/bin/bash - found
===>   hadoop-1.0.0 depends on file: /usr/local/openjdk6/bin/java - found
===>   Generating temporary packing list
===>  Checking if devel/hadoop already installed
===> Creating users and/or groups.
Using existing group `hadoop'.
Using existing user `hadoop'.
===> Installing rc.d startup script(s)
=> Creating RUNDIR /var/run/hadoop... => Creating LOGDIR /var/log/hadoop... ===> Correct pkg-plist sequence to create group(s) and user(s)
===>   Registering installation for hadoop-1.0.0

インストールファイル一式も,2箇所に入るのが確認しています.
一つは,ここに全部入るもの.

# ls -l  /usr/local/lib/hadoop/
total 5
-rw-r--r--   1 foo  wheel   348624 Feb 19  2010 CHANGES.txt
-rw-r--r--   1 foo  wheel    13366 Feb 19  2010 LICENSE.txt
-rw-r--r--   1 foo  wheel      101 Feb 19  2010 NOTICE.txt
-rw-r--r--   1 foo  wheel     1366 Feb 19  2010 README.txt
drwxr-xr-x   2 foo  wheel      512 Feb 19  2010 bin
-rw-r--r--   1 foo  wheel    74035 Feb 19  2010 build.xml
drwxr-xr-x   4 foo  wheel      512 Feb 19  2010 c++
drwxr-xr-x   2 foo  wheel      512 Feb 19  2010 conf
drwxr-xr-x  13 foo  wheel      512 Feb 19  2010 contrib
drwxr-xr-x   7 foo  wheel     2048 Feb 19  2010 docs
-rw-r--r--   1 foo  wheel     6839 Feb 19  2010 hadoop-0.20.2-ant.jar
-rw-r--r--   1 foo  wheel  2689741 Feb 19  2010 hadoop-0.20.2-core.jar
-rw-r--r--   1 foo  wheel   142466 Feb 19  2010 hadoop-0.20.2-examples.jar
-rw-r--r--   1 foo  wheel  1563859 Feb 19  2010 hadoop-0.20.2-test.jar
-rw-r--r--   1 foo  wheel    69940 Feb 19  2010 hadoop-0.20.2-tools.jar
drwxr-xr-x   2 foo  wheel      512 Feb 19  2010 ivy
-rw-r--r--   1 foo  wheel     8852 Feb 19  2010 ivy.xml
drwxr-xr-x   5 foo  wheel     1024 Feb 19  2010 lib
drwxr-xr-x   2 foo  wheel      512 Feb 19  2010 librecordio
drwxr-xr-x  15 foo  wheel      512 Feb 19  2010 src
drwxr-xr-x   8 foo  wheel      512 Feb 19  2010 webapps

もしくは,実行ファイルや設定ファイルが別な場所に入るマシン.

# ls -ltr /usr/local/etc/hadoop/
total 60
drwxr-xr-x  2 root  wheel   512 Sep 16 00:16 envvars.d
-r--r--r--  1 root  wheel   382 Sep 16 00:16 taskcontroller.cfg
-r--r--r--  1 root  wheel   178 Sep 16 00:16 mapred-site.xml
-r--r--r--  1 root  wheel  2033 Sep 16 00:16 mapred-queue-acls.xml
-r--r--r--  1 root  wheel  4441 Sep 16 00:16 log4j.properties
-r--r--r--  1 root  wheel   178 Sep 16 00:16 hdfs-site.xml
-r--r--r--  1 root  wheel  4644 Sep 16 00:16 hadoop-policy.xml
-r--r--r--  1 root  wheel  1488 Sep 16 00:16 hadoop-metrics2.properties
-r--r--r--  1 root  wheel  2237 Sep 16 00:16 hadoop-env.sh
-r--r--r--  1 root  wheel   178 Sep 16 00:16 core-site.xml
-r--r--r--  1 root  wheel   535 Sep 16 00:16 configuration.xsl
-r--r--r--  1 root  wheel  7457 Sep 16 00:16 capacity-scheduler.xml
# ls -ltr /usr/local/share/hadoop/
total 6548
drwxr-xr-x   9 root  wheel      512 Dec 16  2011 webapps
drwxr-xr-x   5 root  wheel     2048 Dec 16  2011 lib
-rw-r--r--   1 root  wheel   287776 Dec 16  2011 hadoop-tools-1.0.0.jar
-rw-r--r--   1 root  wheel  2530737 Dec 16  2011 hadoop-test-1.0.0.jar
-rw-r--r--   1 root  wheel  3740200 Dec 16  2011 hadoop-core-1.0.0.jar
-rw-r--r--   1 root  wheel     6840 Dec 16  2011 hadoop-ant-1.0.0.jar
drwxr-xr-x  10 root  wheel      512 Dec 16  2011 contrib
drwxr-xr-x   2 root  wheel      512 Dec 16  2011 bin    

どちらにしても,複数のJavaを入れていたので,どのJavaでビルドされたのかを確認しました.

# grep JAVA_HOME work/000.java_home.env 
export JAVA_HOME=${JAVA_HOME:-/usr/local/openjdk6}

たまたま,どのFreeBSD環境でも,OpenJDK6が使われていました.

HadoopのドキュメントのGetting Startedに従って,設定ファイルを書き換えます.
まず,スタンドアロンモードを試します.

# cd /usr/local/lib/hadoop/conf
または
# cd /usr/local/etc/hadoop

# emacs -nw hadoop-env.sh 
export JAVA_HOME=/usr/local/openjdk6

そして,サンプルプログラムを実行します.

# mkdir input
# cp *.xml input

# ./bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'dfs[a-z.]+'
または
# /usr/local/share/hadoop/bin/hadoop jar /usr/local/share/examples/hadoop/hadoop-examples-1.0.0.jar grep input output 'dfs[a-z.]+' 

# cat output/*
1       dfsadmin

OK!