Hadoop-CDH4.3.0にWebHDFSの導入
設定前.
$ hadoop fs -ls / Found 4 items drwxr-xr-x - hadoop supergroup 0 2013-06-14 14:29 /hbase drwxrwx--- - hadoop supergroup 0 2013-06-12 18:19 /home drwx------ - hadoop supergroup 0 2013-06-13 14:42 /tmp drwxr-xr-x - hadoop supergroup 0 2013-06-12 17:52 /user
↑これをRESTで見たいです.
$ curl -i 'http://namenode:50070/webhdfs/v1/?op=LISTSTATUS' HTTP/1.1 404 Not Found Cache-Control: must-revalidate,no-cache,no-store Date: Wed, 19 Jun 2013 06:26:05 GMT Pragma: no-cache Date: Wed, 19 Jun 2013 06:26:05 GMT Pragma: no-cache Content-Type: text/html; charset=iso-8859-1 Content-Length: 1376 Server: Jetty(6.1.26.cloudera.2) <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/> <title>Error 404 NOT_FOUND</title> </head> <body><h2>HTTP ERROR 404</h2> <p>Problem accessing /webhdfs/v1/. Reason: <pre> NOT_FOUND</pre></p><hr /><i><small>Powered by Jetty://</small></i><br/> </body> </html>
何も設定していないと,当然エラーです.
設定するために,一度全部デーモンを落とします.
ssh namenode /usr/local/hadoop/sbin/hadoop-daemon.sh --script hdfs stop namenode ssh slave1 /usr/local/hadoop/sbin/hadoop-daemon.sh --script hdfs stop datanode ssh slave1 /usr/local/hadoop/sbin/yarn-daemon.sh stop nodemanager ssh slave2 /usr/local/hadoop/sbin/hadoop-daemon.sh --script hdfs stop datanode ssh slave2 /usr/local/hadoop/sbin/yarn-daemon.sh stop nodemanager ssh manager /usr/local/hadoop/sbin/yarn-daemon.sh stop resourcemanager ssh manager /usr/local/hadoop/sbin/mr-jobhistory-daemon.sh stop historyserver
設定ファイルを書き換えます.
emacs etc/hadoop/hdfs-site.xml
dfs.webhdfs.enabled
true
書き換えて全てのマシンにコピー.
全てのデーモンを立ち上げ.
ssh namenode /usr/local/hadoop/sbin/hadoop-daemon.sh --script hdfs start namenode ssh slave1 /usr/local/hadoop/sbin/hadoop-daemon.sh --script hdfs start datanode ssh slave1 /usr/local/hadoop/sbin/yarn-daemon.sh start nodemanager ssh slave2 /usr/local/hadoop/sbin/hadoop-daemon.sh --script hdfs start datanode ssh slave2 /usr/local/hadoop/sbin/yarn-daemon.sh start nodemanager ssh manager /usr/local/hadoop/sbin/yarn-daemon.sh start resourcemanager ssh manager /usr/local/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
そしてHTTPアクセスを...
curl -i 'http://namenode:50070/webhdfs/v1/?op=LISTSTATUS' HTTP/1.1 200 OK Cache-Control: no-cache Expires: Wed, 19 Jun 2013 07:21:34 GMT Date: Wed, 19 Jun 2013 07:21:34 GMT Pragma: no-cache Expires: Wed, 19 Jun 2013 07:21:34 GMT Date: Wed, 19 Jun 2013 07:21:34 GMT Pragma: no-cache Content-Type: application/json Content-Length: 787 Server: Jetty(6.1.26.cloudera.2) {"FileStatuses":{"FileStatus":[ {"accessTime":0,"blockSize":0,"group":"supergroup","length":0,"modificationTime":1371187759396,"owner":"hadoop","pathSuffix":"hbase","permission":"755","replication":0,"type":"DIRECTORY"}, {"accessTime":0,"blockSize":0,"group":"supergroup","length":0,"modificationTime":1371028761644,"owner":"hadoop","pathSuffix":"home","permission":"770","replication":0,"type":"DIRECTORY"}, {"accessTime":0,"blockSize":0,"group":"supergroup","length":0,"modificationTime":1371102136144,"owner":"hadoop","pathSuffix":"tmp","permission":"700","replication":0,"type":"DIRECTORY"}, {"accessTime":0,"blockSize":0,"group":"supergroup","length":0,"modificationTime":1371027127605,"owner":"hadoop","pathSuffix":"user","permission":"755","replication":0,"type":"DIRECTORY"} ]}}
イケたような...これだけだったのか...最初から設定に入れておけば良かった...
ファイルを書き込んでみる.
curl -i -X PUT 'http://namenode:50070/webhdfs/v1/user/hadoop/test?op=CREATE' HTTP/1.1 307 TEMPORARY_REDIRECT Cache-Control: no-cache Expires: Wed, 19 Jun 2013 08:04:40 GMT Date: Wed, 19 Jun 2013 08:04:40 GMT Pragma: no-cache Expires: Wed, 19 Jun 2013 08:04:40 GMT Date: Wed, 19 Jun 2013 08:04:40 GMT Pragma: no-cache Content-Type: application/octet-stream Location: http://slave2:50075/webhdfs/v1/user/hadoop/test?op=CREATE&namenoderpcaddress=namenode:8020&overwrite=false Content-Length: 0 Server: Jetty(6.1.26.cloudera.2)
スレーブ(データノード)を指定されて,そっちに書けって言われた...
言われた方にアクセスします.
curl -i -X PUT -T '-' 'http://slave2:50075/webhdfs/v1/user/hadoop/test?op=CREATE&namenoderpcaddress=namenode:8020&overwrite=false' HTTP/1.1 100 Continue Hello, world HTTP/1.1 403 Forbidden Cache-Control: no-cache Expires: Wed, 19 Jun 2013 08:09:33 GMT Date: Wed, 19 Jun 2013 08:09:33 GMT Pragma: no-cache Expires: Wed, 19 Jun 2013 08:09:33 GMT Date: Wed, 19 Jun 2013 08:09:33 GMT Pragma: no-cache Content-Type: application/json Transfer-Encoding: chunked Server: Jetty(6.1.26.cloudera.2) {"RemoteException":{"exception":"AccessControlException","javaClassName":"org.apache.hadoop.security.AccessControlException","message":"Permission denied: user=dr.who, access=WRITE, inode=\"/user/hadoop\":hadoop:supergroup:drwxr-xr-x\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:149)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4716)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4698)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4672)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1839)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1771)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1747)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:418)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:207)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44942)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1701)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1697)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:415)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695)\n"}}
ん?
Permission denied: user=dr.who?????
セキュリティ設定をしていないので,ユーザ名のみ設定すれば良いらしいです.
Dr. Whoじゃなくて,アカウントを作ったHadoopというユーザにしてみます.
curl -i -X PUT -T '-' 'http://slave2:50075/webhdfs/v1/user/hadoop/test?user.name=hadoop&op=CREATE&namenoderpcaddress=namenode:8020&overwrite=false' HTTP/1.1 100 Continue Hello, world HTTP/1.1 201 Created Cache-Control: no-cache Expires: Wed, 19 Jun 2013 08:23:44 GMT Date: Wed, 19 Jun 2013 08:23:44 GMT Pragma: no-cache Expires: Wed, 19 Jun 2013 08:23:44 GMT Date: Wed, 19 Jun 2013 08:23:44 GMT Pragma: no-cache Content-Type: application/octet-stream Location: webhdfs://0.0.0.0:50070/user/hadoop/test Content-Length: 0 Server: Jetty(6.1.26.cloudera.2)
良い感じ.
すると
hdfs dfs -ls /user/hadoop -rw-r--r-- 3 hadoop supergroup 13 2013-06-19 17:23 /user/hadoop/test hdfs dfs -cat /user/hadoop/test Hello, world
せっかくなのでWebHDFS経由で読むと
curl -i -L 'http://namenode:50070/webhdfs/v1/user/hadoop/test?op=OPEN' HTTP/1.1 307 TEMPORARY_REDIRECT Cache-Control: no-cache Expires: Wed, 19 Jun 2013 08:27:52 GMT Date: Wed, 19 Jun 2013 08:27:52 GMT Pragma: no-cache Expires: Wed, 19 Jun 2013 08:27:52 GMT Date: Wed, 19 Jun 2013 08:27:52 GMT Pragma: no-cache Content-Type: application/octet-stream Location: http://slave2:50075/webhdfs/v1/user/hadoop/test?op=OPEN&namenoderpcaddress=namenode:8020&offset=0 Content-Length: 0 Server: Jetty(6.1.26.cloudera.2) HTTP/1.1 200 OK Cache-Control: no-cache Expires: Wed, 19 Jun 2013 08:27:52 GMT Date: Wed, 19 Jun 2013 08:27:52 GMT Pragma: no-cache Expires: Wed, 19 Jun 2013 08:27:52 GMT Date: Wed, 19 Jun 2013 08:27:52 GMT Pragma: no-cache Content-Type: application/octet-stream Content-Length: 13 Server: Jetty(6.1.26.cloudera.2) Hello, world
こっちは-LオプションでLocationのリダイレクトを処理してくれるので,2回コマンドを打つ必要が無いようです.