Hadoop-CDH4.3.0にWebHDFSの導入

設定前.

$ hadoop fs -ls /
Found 4 items
drwxr-xr-x   - hadoop supergroup          0 2013-06-14 14:29 /hbase
drwxrwx---   - hadoop supergroup          0 2013-06-12 18:19 /home
drwx------   - hadoop supergroup          0 2013-06-13 14:42 /tmp
drwxr-xr-x   - hadoop supergroup          0 2013-06-12 17:52 /user

↑これをRESTで見たいです.

$ curl -i 'http://namenode:50070/webhdfs/v1/?op=LISTSTATUS'
HTTP/1.1 404 Not Found
Cache-Control: must-revalidate,no-cache,no-store
Date: Wed, 19 Jun 2013 06:26:05 GMT
Pragma: no-cache
Date: Wed, 19 Jun 2013 06:26:05 GMT
Pragma: no-cache
Content-Type: text/html; charset=iso-8859-1
Content-Length: 1376
Server: Jetty(6.1.26.cloudera.2)

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Error 404 NOT_FOUND</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /webhdfs/v1/. Reason:
<pre>    NOT_FOUND</pre></p><hr /><i><small>Powered by Jetty://</small></i><br/>                                                
</body>
</html>

何も設定していないと,当然エラーです.


設定するために,一度全部デーモンを落とします.

ssh namenode /usr/local/hadoop/sbin/hadoop-daemon.sh --script hdfs stop namenode
ssh slave1 /usr/local/hadoop/sbin/hadoop-daemon.sh --script hdfs stop datanode
ssh slave1 /usr/local/hadoop/sbin/yarn-daemon.sh stop nodemanager
ssh slave2 /usr/local/hadoop/sbin/hadoop-daemon.sh --script hdfs stop datanode
ssh slave2 /usr/local/hadoop/sbin/yarn-daemon.sh stop nodemanager
ssh manager /usr/local/hadoop/sbin/yarn-daemon.sh stop resourcemanager
ssh manager /usr/local/hadoop/sbin/mr-jobhistory-daemon.sh stop historyserver

設定ファイルを書き換えます.

emacs etc/hadoop/hdfs-site.xml


dfs.webhdfs.enabled
true

書き換えて全てのマシンにコピー.

全てのデーモンを立ち上げ.

ssh namenode /usr/local/hadoop/sbin/hadoop-daemon.sh --script hdfs start namenode
ssh slave1 /usr/local/hadoop/sbin/hadoop-daemon.sh --script hdfs start datanode
ssh slave1 /usr/local/hadoop/sbin/yarn-daemon.sh start nodemanager
ssh slave2 /usr/local/hadoop/sbin/hadoop-daemon.sh --script hdfs start datanode
ssh slave2 /usr/local/hadoop/sbin/yarn-daemon.sh start nodemanager
ssh manager /usr/local/hadoop/sbin/yarn-daemon.sh start resourcemanager
ssh manager /usr/local/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver

そしてHTTPアクセスを...

curl -i 'http://namenode:50070/webhdfs/v1/?op=LISTSTATUS'
HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Wed, 19 Jun 2013 07:21:34 GMT
Date: Wed, 19 Jun 2013 07:21:34 GMT
Pragma: no-cache
Expires: Wed, 19 Jun 2013 07:21:34 GMT
Date: Wed, 19 Jun 2013 07:21:34 GMT
Pragma: no-cache
Content-Type: application/json
Content-Length: 787
Server: Jetty(6.1.26.cloudera.2)

{"FileStatuses":{"FileStatus":[
{"accessTime":0,"blockSize":0,"group":"supergroup","length":0,"modificationTime":1371187759396,"owner":"hadoop","pathSuffix":"hbase","permission":"755","replication":0,"type":"DIRECTORY"},
{"accessTime":0,"blockSize":0,"group":"supergroup","length":0,"modificationTime":1371028761644,"owner":"hadoop","pathSuffix":"home","permission":"770","replication":0,"type":"DIRECTORY"},
{"accessTime":0,"blockSize":0,"group":"supergroup","length":0,"modificationTime":1371102136144,"owner":"hadoop","pathSuffix":"tmp","permission":"700","replication":0,"type":"DIRECTORY"},
{"accessTime":0,"blockSize":0,"group":"supergroup","length":0,"modificationTime":1371027127605,"owner":"hadoop","pathSuffix":"user","permission":"755","replication":0,"type":"DIRECTORY"}
]}}

イケたような...これだけだったのか...最初から設定に入れておけば良かった...

ファイルを書き込んでみる.

curl -i -X PUT 'http://namenode:50070/webhdfs/v1/user/hadoop/test?op=CREATE'
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Wed, 19 Jun 2013 08:04:40 GMT
Date: Wed, 19 Jun 2013 08:04:40 GMT
Pragma: no-cache
Expires: Wed, 19 Jun 2013 08:04:40 GMT
Date: Wed, 19 Jun 2013 08:04:40 GMT
Pragma: no-cache
Content-Type: application/octet-stream
Location: http://slave2:50075/webhdfs/v1/user/hadoop/test?op=CREATE&namenoderpcaddress=namenode:8020&overwrite=false
Content-Length: 0
Server: Jetty(6.1.26.cloudera.2)

スレーブ(データノード)を指定されて,そっちに書けって言われた...
言われた方にアクセスします.

curl -i -X PUT -T '-' 'http://slave2:50075/webhdfs/v1/user/hadoop/test?op=CREATE&namenoderpcaddress=namenode:8020&overwrite=false'
HTTP/1.1 100 Continue

Hello, world
HTTP/1.1 403 Forbidden
Cache-Control: no-cache
Expires: Wed, 19 Jun 2013 08:09:33 GMT
Date: Wed, 19 Jun 2013 08:09:33 GMT
Pragma: no-cache
Expires: Wed, 19 Jun 2013 08:09:33 GMT
Date: Wed, 19 Jun 2013 08:09:33 GMT
Pragma: no-cache
Content-Type: application/json
Transfer-Encoding: chunked
Server: Jetty(6.1.26.cloudera.2)

{"RemoteException":{"exception":"AccessControlException","javaClassName":"org.apache.hadoop.security.AccessControlException","message":"Permission denied: user=dr.who, access=WRITE, inode=\"/user/hadoop\":hadoop:supergroup:drwxr-xr-x\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:149)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4716)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4698)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4672)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1839)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1771)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1747)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:418)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:207)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44942)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1701)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1697)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:415)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695)\n"}}

ん?
Permission denied: user=dr.who?????

セキュリティ設定をしていないので,ユーザ名のみ設定すれば良いらしいです.
Dr. Whoじゃなくて,アカウントを作ったHadoopというユーザにしてみます.

 curl -i -X PUT -T '-' 'http://slave2:50075/webhdfs/v1/user/hadoop/test?user.name=hadoop&op=CREATE&namenoderpcaddress=namenode:8020&overwrite=false'
HTTP/1.1 100 Continue

Hello, world
HTTP/1.1 201 Created
Cache-Control: no-cache
Expires: Wed, 19 Jun 2013 08:23:44 GMT
Date: Wed, 19 Jun 2013 08:23:44 GMT
Pragma: no-cache
Expires: Wed, 19 Jun 2013 08:23:44 GMT
Date: Wed, 19 Jun 2013 08:23:44 GMT
Pragma: no-cache
Content-Type: application/octet-stream
Location: webhdfs://0.0.0.0:50070/user/hadoop/test
Content-Length: 0
Server: Jetty(6.1.26.cloudera.2)

良い感じ.

すると

hdfs dfs -ls /user/hadoop
-rw-r--r--   3 hadoop supergroup         13 2013-06-19 17:23 /user/hadoop/test

hdfs dfs -cat /user/hadoop/test
Hello, world

せっかくなのでWebHDFS経由で読むと

curl -i -L 'http://namenode:50070/webhdfs/v1/user/hadoop/test?op=OPEN'
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Wed, 19 Jun 2013 08:27:52 GMT
Date: Wed, 19 Jun 2013 08:27:52 GMT
Pragma: no-cache
Expires: Wed, 19 Jun 2013 08:27:52 GMT
Date: Wed, 19 Jun 2013 08:27:52 GMT
Pragma: no-cache
Content-Type: application/octet-stream
Location: http://slave2:50075/webhdfs/v1/user/hadoop/test?op=OPEN&namenoderpcaddress=namenode:8020&offset=0
Content-Length: 0
Server: Jetty(6.1.26.cloudera.2)

HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Wed, 19 Jun 2013 08:27:52 GMT
Date: Wed, 19 Jun 2013 08:27:52 GMT
Pragma: no-cache
Expires: Wed, 19 Jun 2013 08:27:52 GMT
Date: Wed, 19 Jun 2013 08:27:52 GMT
Pragma: no-cache
Content-Type: application/octet-stream
Content-Length: 13
Server: Jetty(6.1.26.cloudera.2)

Hello, world

こっちは-LオプションでLocationのリダイレクトを処理してくれるので,2回コマンドを打つ必要が無いようです.