リアルタイム処理 for Big DataのDempsyのサンプルを動かしてみる

以前,ドキュメントを流し読みしたDempsyです.
Streaming Processing for Big Data - なぜか数学者にはワイン好きが多い
順番ということで,試したかったStormよりも先に動かしてみました.

ドキュメントによると,

Prerequisites

You will need Java 1.6 or higher.

FreeBSDにはオフィシャルなOracle Javaは無いですが...

> java -version
openjdk version "1.6.0_32"
OpenJDK Runtime Environment (build 1.6.0_32-b25) 
OpenJDK Client VM (build 20.0-b12, mixed mode)

いけるかな...OpenJDKで.

To build an application against Dempsy you will need to add the Dempsy dependencies to your build. This should be as simple as including the following dependency in your maven pom.xml file (or the gradle equivalent).

mavenも必要.

> mvn --version 
Apache Maven 2.2.1 (r801777; 2009-08-07 04:16:01+0900)
Java version: 1.6.0_32 
Java home: /usr/local/openjdk6/jre
Default locale: en, platform encoding: ISO8859-1
OS name: "freebsd" version: "9.0-release" arch: "i386" Family: "unix" 

gitで,dempsyのサンプルプログラムを落とします.

> git clone git://github.com/Dempsy/Dempsy-examples.git Dempsy-examples
Cloning into 'Dempsy-examples'...
remote: Counting objects: 216, done.
remote: Compressing objects: 100% (103/103), done.
remote: Total 216 (delta 47), reused 210 (delta 41)
Receiving objects: 100% (216/216), 1.56 MiB | 446 KiB/s, done.
Resolving deltas: 100% (47/47), done.

(gitプロトコルじゃなくてhttpsやhttpだとerror: Could not resolve host: github.comが出て落とせなかったのですが,まだ調べ切れていません)

cd Dempsy-examples
mvn install
> ls -l userguide-wordcount/target/userguide-wordcount-1.0-SNAPSHOT.jar
-rw-r--r--  1 joe  wheel  1359200 Aug  5 19:26 userguide-wordcount/target/userguide-wordcount-1.0-SNAPSHOT.jar

ビルドもできて,~/.m2/以下に,Springも含めた依存ライブラリもダウンロードされているはずです.

Javaclasspathを通すのが面倒なので,mavenで取得されたライブラリのうち,最低限必要な下記のライブラリを,mkdir libしてその下に配置します.

ls lib
> ls lib
commons-io-1.4.jar                      lib-dempsyimpl-0.7.jar                  metrics-ganglia-2.0.2.jar               slf4j-log4j12-1.6.4.jar                 spring-context-3.0.6.RELEASE.jar
commons-logging-1.1.1.jar               lib-dempsyspring-0.7.jar                metrics-graphite-2.0.2.jar              spring-aop-3.0.6.RELEASE.jar            spring-core-3.0.6.RELEASE.jar
lib-dempsyapi-0.7.jar                   log4j-1.2.14.jar                        quartz-2.0.1.jar                        spring-asm-3.0.6.RELEASE.jar            spring-expression-3.0.6.RELEASE.jar
lib-dempsycore-0.7.jar                  metrics-core-2.0.2.jar                  slf4j-api-1.6.4.jar                     spring-beans-3.0.6.RELEASE.jar

そうすると,これくらいで実行できます.

> cp userguide-wordcount/src/main/resources/WordCount.xml DempsyApplicationContext-WordCount.xml
> java -Dapplication=WordCount -cp .:./lib/\*:./userguide-wordcount/target/\* com.nokia.dempsy.spring.RunAppInVm

2012-08-08 21:12:54,356 [main] INFO  ClassPathXmlApplicationContext - Refreshing org.springframework.context.support.ClassPathXmlApplicationContext@1df5a8f: startup date [Wed Aug 08 21:12:54 JST 2012]; root of context hierarchy
2012-08-08 21:12:54,434 [main] INFO  XmlBeanDefinitionReader - Loading XML bean definitions from class path resource [DempsyApplicationContext-WordCount.xml]
2012-08-08 21:12:54,619 [main] INFO  XmlBeanDefinitionReader - Loading XML bean definitions from class path resource [Dempsy-localVm.xml]
2012-08-08 21:12:54,723 [main] INFO  DefaultListableBeanFactory - Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@1f48262: defining beans [com.nokia.dempsy.config.ApplicationDefinition#0,properties,localVMContainerClusterSessionFactory,Dempsy]; root of factory hierarchy
2012-08-08 21:12:55,006 [Adaptor - "com.nokia.dempsy.example.userguide.wordcount.WordAdaptor@17918f0" of type "com.nokia.dempsy.example.userguide.wordcount.WordAdaptor"] INFO  Dempsy - Starting adaptor thread for "com.nokia.dempsy.example.userguide.wordcount.WordAdaptor@17918f0" of type "com.nokia.dempsy.example.userguide.wordcount.WordAdaptor"
2012-08-08 21:12:55,170 [main] INFO  SimpleThreadPool - Job execution threads will use class loader of thread: main
2012-08-08 21:12:55,192 [main] INFO  SchedulerSignalerImpl - Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl
2012-08-08 21:12:55,194 [main] INFO  QuartzScheduler - Quartz Scheduler v.2.0.1 created.
2012-08-08 21:12:55,196 [main] INFO  RAMJobStore - RAMJobStore initialized.
2012-08-08 21:12:55,202 [main] INFO  QuartzScheduler - Scheduler meta-data: Quartz Scheduler (v2.0.1) 'DefaultQuartzScheduler' with instanceId 'NON_CLUSTERED'
  Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally.
  NOT STARTED.
  Currently in standby mode.
  Number of jobs executed: 0
  Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 10 threads.
  Using job-store 'org.quartz.simpl.RAMJobStore' - which does not support persistence. and is not clustered.

2012-08-08 21:12:55,203 [main] INFO  StdSchedulerFactory - Quartz scheduler 'DefaultQuartzScheduler' initialized from default resource file in Quartz package: 'quartz.properties'
2012-08-08 21:12:55,204 [main] INFO  StdSchedulerFactory - Quartz scheduler version: 2.0.1
2012-08-08 21:12:55,219 [main] INFO  QuartzScheduler - Scheduler DefaultQuartzScheduler_$_NON_CLUSTERED started.
2012-08-08 21:12:55,220 [main] INFO  QuartzScheduler - Scheduler DefaultQuartzScheduler_$_NON_CLUSTERED started.
2012-08-08 21:12:57,342 [Timer-0] INFO  UpdateChecker - New Quartz update(s) found: 2.1.4 [http://www.terracotta.org/kit/reflector?kitID=default&pageID=QuartzChangeLog]
2012-08-08 21:13:04,666 [Adaptor - "com.nokia.dempsy.example.userguide.wordcount.WordAdaptor@17918f0" of type "com.nokia.dempsy.example.userguide.wordcount.WordAdaptor"] INFO  Dempsy - Adaptor thread for "com.nokia.dempsy.example.userguide.wordcount.WordAdaptor@17918f0" of type "com.nokia.dempsy.example.userguide.wordcount.WordAdaptor" is shutting down
And:8783
in:7738
he:5819
unto:5626
a:5147
with:3822
LORD:3319
will:2239
as:1774
him:1754
(snip)

日本語の貴重な解説としては,以下を参考にさせて頂きました.
Dempsyアプリケーションを動作させてみます!(ローカル版 - Taste of Tech Topics

Adapterはファイルをのんびり読んでMessageProcessorに投げているだけであり,Hadoopにファイルをputしてジョブを走らせるのと比べると,ストリーミングに処理されていることがよく分かりました.