Hadoop Streamingでカウンタ - なぜか数学者にはワイン好きが多い

JavaのMapReduceで言う，こんなのがやりたかったのです．

    private static enum ResultsCount {
        YES,
        NO,
    };
    @Override
    public final void reduce(final LongWritable key, final Iterable<LongWritable> values, final Context output)
            throws IOException {


       output.getCounter(ResultsCount.YES).increment(1);
    }

ドキュメントを見ると，簡単にできるようです．

Hadoop Streaming

How do I update counters in streaming applications?
A streaming process can use the stderr to emit counter information. reporter:counter:,, should be sent to stderr to update the counter.

スクリプトはRubyで書きました．

results_count_yes = 0
STDIN.each_line do |line|

  results_count_yes += 1
  STDERR.puts "reporter:counter:ResultsCount,YES,#{results_count_yes}"

end

HDFSに置いてあるsample.logを読んで実行．カウンターが出力された感じです．

$ hadoop jar $HADOOP_HOME/tools/lib/hadoop-streaming-2.0.0-cdh4.4.0.jar -input log/sample.log -output output -mapper mapper.rb -file mapper.rb
（大量の出力）
        ResultsCount
                YES=19201351

※2014/08/16 追記：ウソを書いているので，最新版をご参照下さい
Hadoop Streamingでカウンタの実装を間違えた話 - なぜか数学者にはワイン好きが多い