The MultipleOutputs class
- simplifies writting to additional outputs other than the job default output
- via the
OutputCollector
passed to themap()
andreduce()
methods of theMapper
andReducer
implementations.
Each additional output, or named output, may be configured with its own OutputFormat
, with its own key class and with its own value class.
A named output can be a single file or a multi file. The later is refered as a multi named output.
A multi named output is an unbound set of files all sharing the same OutputFormat
, key class and value class configuration.
When named outputs are used within a Mapper
implementation, key/values written to a name output are not part of the reduce phase, only key/values written to the job OutputCollector
are part of the reduce phase.
MultipleOutputs supports counters, by default the are disabled.
The counters group is the MultipleOutputs
class name.
The names of the counters are the same as the named outputs.
For multi named outputs the name of the counter is the concatenation of the named output, and underscore '_' and the multiname.
Job configuration usage pattern is:
JobConf conf = new JobConf(); conf.setInputPath(inDir); FileOutputFormat.setOutputPath(conf, outDir); conf.setMapperClass(MOMap.class); conf.setReducerClass(MOReduce.class); ... // Defines additional single text based output 'text' for the job MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class, LongWritable.class, Text.class); // Defines additional multi sequencefile based output 'sequence' for the // job MultipleOutputs.addMultiNamedOutput(conf, "seq", SequenceFileOutputFormat.class, LongWritable.class, Text.class); ... JobClient jc = new JobClient(); RunningJob job = jc.submitJob(conf); ...
Job configuration usage pattern is:
public class MOReduce implements Reducer<WritableComparable, Writable> { private MultipleOutputs mos; public void configure(JobConf conf) { ... mos = new MultipleOutputs(conf); } public void reduce(WritableComparable key, Iterator<Writable> values, OutputCollector output, Reporter reporter) throws IOException { ...
// 단일 출력 mos.getCollector("text", reporter).collect(key, new Text("Hello"));
// 다중 출력 mos.getCollector("seq", "A", reporter).collect(key, new Text("Bye")); mos.getCollector("seq", "B", reporter).collect(key, new Text("Chau")); ... } public void close() throws IOException { mos.close(); ... } }
참고
'Java' 카테고리의 다른 글
구글 앱 엔진이란? (0) | 2010.04.08 |
---|---|
[Hadoop] 진행상황이 리포팅 되는 경우들 (0) | 2010.04.06 |
[Hadoop] GeneralOptionParser, ToolRunner 클래스 옵션 (0) | 2010.04.05 |
[Java] jar 내용보기 (0) | 2010.02.19 |
[링크] Java Persistence API (0) | 2010.02.18 |
[링크] JDO(Java Data Object)란? (0) | 2010.02.18 |
Java Reverse Engineering Tools (0) | 2010.02.08 |