[解決確認] Sparkのインストールにハマる[6]

とりあえず,サンプルプログラムくらい動かさないと,構築できたかどうか分かりません.

モンテカルロ法の円周率の計算をいきましょう.Hadoopで言う,hadoop-examplesのpiですね.
実行.

 export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop;export SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.0.0-cdh4.4.0.jar;export SPARK_YARN_APP_JAR=./examples/target/scala-2.10/spark-examples-assembly-0.9.1.jar; export=SPARK_YARN_MODE=true; ./bin/run-example org.apache.spark.examples.SparkPi yarn-client

これ,一行に書く必要は全くないのですが,とりあえずSparkのドキュメントにあるようにexport無しでやると,環境変数が引き継がれなくてエラーになったんです.
なので,環境変数は一度設定すると,あとはrun-exampleからの入力で大丈夫です.

14/05/12 18:31:22 INFO yarn.Client: Command for the ApplicationMaster: $JAVA_HOME/bin/java -server -Xmx640m  -Djava.io.tmpdir=$PWD/tmp  org.apache.spark.deploy.yarn.WorkerLauncher --class notused --jar ./examples/target/scala-2.10/spark-examples-assembly-0.9.1.jar --args  'sparkclient:33070'  --worker-memory 1024 --worker-cores 1 --num-workers 2 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
14/05/12 18:31:22 INFO yarn.Client: Submitting application to ASM
14/05/12 18:31:22 INFO client.YarnClientImpl: Submitted application application_1382610529109_9585 to ResourceManager at resourcemanager/192.168.1.4:8040
14/05/12 18:31:22 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: 
         appMasterRpcPort: 0
         appStartTime: 1399887082601
         yarnAppState: ACCEPTED

14/05/12 18:31:37 INFO scheduler.TaskSetManager: Finished TID 0 in 3962 ms on hadoopdatanode3 (progress: 1/2)
14/05/12 18:31:37 INFO util.RackResolver: Resolved hadoopdatanode2  to /default-rack
14/05/12 18:31:37 INFO scheduler.DAGScheduler: Completed ResultTask(0, 1)
14/05/12 18:31:37 INFO scheduler.TaskSetManager: Finished TID 1 in 3846 ms on hadoopdatanode2 (progress: 2/2)
14/05/12 18:31:37 INFO scheduler.DAGScheduler: Stage 0 (reduce at SparkPi.scala:39) finished in 6.291 s
14/05/12 18:31:37 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 
14/05/12 18:31:37 INFO spark.SparkContext: Job finished: reduce at SparkPi.scala:39, took 6.428673126 s
Pi is roughly 3.13856

良い感じです!Hadoop側のresourcemanagerにアクセスして,その後Hadoopのデータノードの2つにアクセスしてる感じです.
ホンマにラフですが,円周率の近似値が出てます.