The MultipleOutputs class
- simplifies writting to additional outputs other than the job default output
- via the
OutputCollector
passed to the map()
and reduce()
methods of the Mapper
and Reducer
implementations.
Each additional output, or named output, may be configured with its own OutputFormat
, with its own key class and with its own value class.
A named output can be a single file or a multi file. The later is refered as a multi named output.
A multi named output is an unbound set of files all sharing the same OutputFormat
, key class and value class configuration.
When named outputs are used within a Mapper
implementation, key/values written to a name output are not part of the reduce phase, only key/values written to the job OutputCollector
are part of the reduce phase.
MultipleOutputs supports counters, by default the are disabled.
The counters group is the MultipleOutputs
class name.
The names of the counters are the same as the named outputs.
For multi named outputs the name of the counter is the concatenation of the named output, and underscore '_' and the multiname.
Job configuration usage pattern is:
JobConf conf = new JobConf();
conf.setInputPath(inDir);
FileOutputFormat.setOutputPath(conf, outDir);
conf.setMapperClass(MOMap.class);
conf.setReducerClass(MOReduce.class);
...
// Defines additional single text based output 'text' for the job
MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class,
LongWritable.class, Text.class);
// Defines additional multi sequencefile based output 'sequence' for the
// job
MultipleOutputs.addMultiNamedOutput(conf, "seq",
SequenceFileOutputFormat.class,
LongWritable.class, Text.class);
...
JobClient jc = new JobClient();
RunningJob job = jc.submitJob(conf);
...
Job configuration usage pattern is:
public class MOReduce implements Reducer<WritableComparable, Writable> {
private MultipleOutputs mos;
public void configure(JobConf conf) {
...
mos = new MultipleOutputs(conf);
}
public void reduce(WritableComparable key, Iterator<Writable> values,
OutputCollector output, Reporter reporter)
throws IOException {
...
// 단일 출력
mos.getCollector("text", reporter).collect(key, new Text("Hello"));
// 다중 출력
mos.getCollector("seq", "A", reporter).collect(key, new Text("Bye"));
mos.getCollector("seq", "B", reporter).collect(key, new Text("Chau"));
...
}
public void close() throws IOException {
mos.close();
...
}
}
참고