再びSparkにハマる
全開,Sparkをビルドから挑戦してハマったので...
Sparkのインストールにハマる[1] - なぜか数学者にはワイン好きが多い
Sparkのインストールにハマる[2] - なぜか数学者にはワイン好きが多い
今度はCDH用にビルド済みのものを使って手抜きをしようと思いました.
> wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.0-bin-cdh4.tgz > su # tar xf spark-1.0.0-bin-cdh4.tgz -C /usr/local # cd /usr/local/ # ln -s spark-1.0.0-bin-cdh4/ spark # exit ※まずはローカルモード > /usr/local/spark/bin/run-example SparkPi 3 14/07/02 16:31:35 INFO SparkContext: Starting job: reduce at SparkPi.scala:35 14/07/02 16:31:35 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 3 output partitions (allowLocal=false) 14/07/02 16:31:35 INFO DAGScheduler: Final stage: Stage 0(reduce at SparkPi.scala:35) 14/07/02 16:31:35 INFO DAGScheduler: Parents of final stage: List() 14/07/02 16:31:35 INFO DAGScheduler: Missing parents: List() 14/07/02 16:31:35 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at SparkPi.scala:31), which has no missing parents 14/07/02 16:31:35 INFO DAGScheduler: Submitting 3 missing tasks from Stage 0 (MappedRDD[1] at map at SparkPi.scala:31) 14/07/02 16:31:35 INFO TaskSchedulerImpl: Adding task set 0.0 with 3 tasks 14/07/02 16:31:35 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor localhost: localhost (PROCESS_LOCAL) 14/07/02 16:31:35 INFO TaskSetManager: Serialized task 0.0:0 as 1424 bytes in 2 ms 14/07/02 16:31:35 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor localhost: localhost (PROCESS_LOCAL) 14/07/02 16:31:35 INFO TaskSetManager: Serialized task 0.0:1 as 1424 bytes in 0 ms 14/07/02 16:31:35 INFO TaskSetManager: Starting task 0.0:2 as TID 2 on executor localhost: localhost (PROCESS_LOCAL) 14/07/02 16:31:35 INFO TaskSetManager: Serialized task 0.0:2 as 1424 bytes in 1 ms 14/07/02 16:31:35 INFO Executor: Running task ID 1 14/07/02 16:31:35 INFO Executor: Running task ID 0 14/07/02 16:31:35 INFO Executor: Running task ID 2 14/07/02 16:31:35 INFO Executor: Fetching http://192.168.0.10:48205/jars/spark-examples-1.0.0-hadoop2.0.0-mr1-cdh4.2.0.jar with timestamp 1404286295464 14/07/02 16:31:35 INFO Utils: Fetching http://192.168.0.10:48205/jars/spark-examples-1.0.0-hadoop2.0.0-mr1-cdh4.2.0.jar to /tmp/fetchFileTemp6531578495995963794.tmp 14/07/02 16:31:36 INFO Executor: Adding file:/tmp/spark-3044ce77-f9a9-4370-b6a4-2fd11a69f14a/spark-examples-1.0.0-hadoop2.0.0-mr1-cdh4.2.0.jar to class loader 14/07/02 16:31:36 INFO Executor: Serialized size of result for 2 is 675 14/07/02 16:31:36 INFO Executor: Serialized size of result for 0 is 675 14/07/02 16:31:36 INFO Executor: Serialized size of result for 1 is 675 14/07/02 16:31:36 INFO Executor: Sending result for 2 directly to driver 14/07/02 16:31:36 INFO Executor: Sending result for 0 directly to driver 14/07/02 16:31:36 INFO Executor: Sending result for 1 directly to driver 14/07/02 16:31:36 INFO Executor: Finished task ID 2 14/07/02 16:31:36 INFO Executor: Finished task ID 0 14/07/02 16:31:36 INFO Executor: Finished task ID 1 14/07/02 16:31:36 INFO TaskSetManager: Finished TID 2 in 611 ms on localhost (progress: 1/3) 14/07/02 16:31:36 INFO DAGScheduler: Completed ResultTask(0, 2) 14/07/02 16:31:36 INFO DAGScheduler: Completed ResultTask(0, 0) 14/07/02 16:31:36 INFO TaskSetManager: Finished TID 0 in 627 ms on localhost (progress: 2/3) 14/07/02 16:31:36 INFO DAGScheduler: Completed ResultTask(0, 1) 14/07/02 16:31:36 INFO TaskSetManager: Finished TID 1 in 619 ms on localhost (progress: 3/3) 14/07/02 16:31:36 INFO DAGScheduler: Stage 0 (reduce at SparkPi.scala:35) finished in 0.642 s 14/07/02 16:31:36 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/07/02 16:31:36 INFO SparkContext: Job finished: reduce at SparkPi.scala:35, took 0.774402315 s Pi is roughly 3.1403066666666666
問題無し.
次にYARNクライアントモード.
> MASTER=yarn-client HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop /usr/local/spark/bin/run-example SparkPi 3 Spark assembly has been built with Hive, including Datanucleus jars on classpath Error: Could not load YARN classes. This copy of Spark may not have been compiled with YARN support. Run with --help for usage help or --verbose for debug output
アレ?
手抜きのせい?
このメッセージを出しているのはどこか探してみると,この辺り.
if (!Utils.classIsLoadable("org.apache.spark.deploy.yarn.Client") && !Utils.isTesting) { val msg = "Could not load YARN classes. This copy of Spark may not have been compiled" + "with YARN support."
なので,jarの中にorg.apache.spark.deploy.yarn.Clientがあるか確認してみます.
> jar tvf lib/spark-assembly-1.0.0-hadoop2.0.0-mr1-cdh4.2.0.jar | grep org/apache/spark/deploy/yarn/Client.class >
無いです!
前回,苦労の果てにちょっとうまく行ったっぽいヤツを見てみます.
[一旦解決] Sparkのインストールにハマる[5] - なぜか数学者にはワイン好きが多い
>jar tvf spark-assembly_2.10-0.9.1-hadoop2.2.0.jar | grep org/apache/spark/deploy/yarn/Client.class > 31306 Thu Mar 27 05:50:52 JST 2014 org/apache/spark/deploy/yarn/Client.class
有ります!
手抜きしちゃいけないということで...安易にプリビルドやバイナリインストールに頼らずに,ソースインストールも頑張ることにします.