[解決確認] Sparkのインストールにハマる[6]
とりあえず,サンプルプログラムくらい動かさないと,構築できたかどうか分かりません.
モンテカルロ法の円周率の計算をいきましょう.Hadoopで言う,hadoop-examplesのpiですね.
実行.
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop;export SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.0.0-cdh4.4.0.jar;export SPARK_YARN_APP_JAR=./examples/target/scala-2.10/spark-examples-assembly-0.9.1.jar; export=SPARK_YARN_MODE=true; ./bin/run-example org.apache.spark.examples.SparkPi yarn-client
これ,一行に書く必要は全くないのですが,とりあえずSparkのドキュメントにあるようにexport無しでやると,環境変数が引き継がれなくてエラーになったんです.
なので,環境変数は一度設定すると,あとはrun-exampleからの入力で大丈夫です.
14/05/12 18:31:22 INFO yarn.Client: Command for the ApplicationMaster: $JAVA_HOME/bin/java -server -Xmx640m -Djava.io.tmpdir=$PWD/tmp org.apache.spark.deploy.yarn.WorkerLauncher --class notused --jar ./examples/target/scala-2.10/spark-examples-assembly-0.9.1.jar --args 'sparkclient:33070' --worker-memory 1024 --worker-cores 1 --num-workers 2 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr 14/05/12 18:31:22 INFO yarn.Client: Submitting application to ASM 14/05/12 18:31:22 INFO client.YarnClientImpl: Submitted application application_1382610529109_9585 to ResourceManager at resourcemanager/192.168.1.4:8040 14/05/12 18:31:22 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: appMasterRpcPort: 0 appStartTime: 1399887082601 yarnAppState: ACCEPTED 14/05/12 18:31:37 INFO scheduler.TaskSetManager: Finished TID 0 in 3962 ms on hadoopdatanode3 (progress: 1/2) 14/05/12 18:31:37 INFO util.RackResolver: Resolved hadoopdatanode2 to /default-rack 14/05/12 18:31:37 INFO scheduler.DAGScheduler: Completed ResultTask(0, 1) 14/05/12 18:31:37 INFO scheduler.TaskSetManager: Finished TID 1 in 3846 ms on hadoopdatanode2 (progress: 2/2) 14/05/12 18:31:37 INFO scheduler.DAGScheduler: Stage 0 (reduce at SparkPi.scala:39) finished in 6.291 s 14/05/12 18:31:37 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/05/12 18:31:37 INFO spark.SparkContext: Job finished: reduce at SparkPi.scala:39, took 6.428673126 s Pi is roughly 3.13856
良い感じです!Hadoop側のresourcemanagerにアクセスして,その後Hadoopのデータノードの2つにアクセスしてる感じです.
ホンマにラフですが,円周率の近似値が出てます.