HDFS無しのHadoopクラスタを構築しようとして失敗した話
core-site.xmlを変更してみた.
<property> <name>fs.defaultFS</name> <!-- <value>file:///</value> --> <value>s3n://XXXXXXXXXXXXXXXXX:YYYYYYYYYYYYYYYYY@mybacket-01</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property>
あと,プロパティのキーも変な気がしたので変えてみました.
<property> <!-- <name>fs.s3.impl</name> --> <name>fs.AbstractFileSystem.s3.impl</name> <value>org.apache.hadoop.fs.s3.S3FileSystem</value> <description>The FileSystem for s3: uris.</description> </property> <property> <!-- <name>fs.s3n.impl</name> --> <name>fs.AbstractFileSystem.s3n.impl</name> <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value> <description>The FileSystem for s3n: (Native S3) uris.</description> </property>
hdfsコマンドは普通に動きます.
hdfs dfs -mkdir input hdfs dfs -ls 14/04/02 02:17:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items drwxrwxrwx - 0 1970-01-01 00:00 input
exampleが動かないですねぇ.
$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.5.0.jar pi 2 2 Number of Maps = 2 Samples per Map = 2 14/04/03 19:10:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Wrote input for Map #0 Wrote input for Map #1 Starting Job 14/04/03 19:10:13 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 14/04/03 19:10:13 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. 14/04/03 19:10:13 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.YarnClientProtocolProvider due to error: java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3.S3FileSystem.<init>(java.net.URI, org.apache.hadoop.conf.Configuration) 14/04/03 19:10:13 ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:122) at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:84) at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:77) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1239) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1235) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapreduce.Job.connect(Job.java:1234) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1263) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1287) at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306) at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:351) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:360) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3.S3FileSystemと言われても,ソースを見てもjarを見てもあるんですよねぇ.
Hiveを実行してみます.
$ hive -e 'show databases' Logging initialized using configuration in jar:file:/usr/local/hive-0.10.0-cdh4.5.0/lib/hive-common-0.10.0-cdh4.5.0.jar!/hive-log4j.properties Hive history file=/tmp/hadoop/hive_job_log_78148c99-e93f-4634-b292-20a2e73904cd_398905190.txt SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.0.0-cdh4.5.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hive-0.10.0-cdh4.5.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. OK database_name default $ hive -e "create database user_db" Logging initialized using configuration in jar:file:/usr/local/hive-0.10.0-cdh4.5.0/lib/hive-common-0.10.0-cdh4.5.0.jar!/hive-log4j.properties Hive history file=/tmp/hadoop/hive_job_log_ef5dc454-70ec-4dc0-b444-0ba8de46160a_1810542074.txt SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.0.0-cdh4.5.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hive-0.10.0-cdh4.5.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. OK Time taken: 5.315 seconds $ hive -e "create table user_db.user (id int, name string) row format delimited fields terminated by '\t'" Logging initialized using configuration in jar:file:/usr/local/hive-0.10.0-cdh4.5.0/lib/hive-common-0.10.0-cdh4.5.0.jar!/hive-log4j.properties Hive history file=/tmp/hadoop/hive_job_log_25cba8fe-84b4-4520-858a-14d0f64c36dc_262885078.txt SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.0.0-cdh4.5.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hive-0.10.0-cdh4.5.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. OK Time taken: 6.697 seconds
データベース・テーブルを作るところまではイケてるようです.LOAD DATAするためのTSVファイルを用意して,テーブルに流し込みます.
$ cat /tmp/dat.tsv 123 abc 345 cde $ hive -e 'load data local inpath "/tmp/dat.tsv" into table user_db.user' Logging initialized using configuration in jar:file:/usr/local/hive-0.10.0-cdh4.5.0/lib/hive-common-0.10.0-cdh4.5.0.jar!/hive-log4j.properties Hive history file=/tmp/hadoop/hive_job_log_358acc9b-afae-4970-a7a3-30b762630615_1611174169.txt SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.0.0-cdh4.5.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hive-0.10.0-cdh4.5.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Copying data from file:/tmp/dat.tsv Copying file: file:/tmp/dat.tsv Loading data to table user_table.user -chgrp: '' does not match expected pattern for group Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH... Table user_table.user stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 16, raw_data_size: 0] OK Time taken: 8.7 seconds
なんか変なエラーが出てますね.hadoopコマンドを呼びだそうとして引数が行ってないみたい.
中身を見ると,入っているようです.
$ hive -e 'select * from user_db.user' Logging initialized using configuration in jar:file:/usr/local/hive-0.10.0-cdh4.5.0/lib/hive-common-0.10.0-cdh4.5.0.jar!/hive-log4j.properties Hive history file=/tmp/hadoop/hive_job_log_c15e49a8-8b96-4cdd-b5b3-c27ac8e27181_772650078.txt SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.0.0-cdh4.5.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hive-0.10.0-cdh4.5.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. OK id name 123 abc 345 cde Time taken: 6.485 seconds
でも,HiveはSELECT *みたいな単純過ぎるクエリはHadoopをシカトして実行されます.
なので,ちょっとでも変更するとダメでした.
$ hive -e "select * from user_db.user where name like '%a%'" Logging initialized using configuration in jar:file:/usr/local/hive-0.10.0-cdh4.5.0/lib/hive-common-0.10.0-cdh4.5.0.jar!/hive-log4j.properties Hive history file=/tmp/hadoop/hive_job_log_980441f2-46a0-4447-aeb3-e286d513f92d_1468706777.txt SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.0.0-cdh4.5.0/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hive-0.10.0-cdh4.5.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:122) at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:84) at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:77) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:478) at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:457) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:426) at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:138) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Job Submission failed with exception 'java.io.IOException(Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
やはりCannot initialize Clusterですね...
あまり理解していないのですが,
http://shun0102.net/?p=198
ただ、fs.default.nameにHDFSを指定しない場合NameNode、DataNodeが起動せず、HDFSを使えないので、併用する場合はawsAccessKeyIdとawsSecretAccessKeyだけ設定してデフォルトはHDFSに設定する必要があります。
という説明もあったので,↓ここの部分を変更して試してみようかと思います.大容量ファイルを扱うので,s3n://よりs3://の方がいいのかな.
<property> <name>fs.defaultFS</name> <!-- <value>file:///</value> --> <value>s3n://XXXXXXXXXXXXXXXXX:YYYYYYYYYYYYYYYYY@mybacket-01</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property>