'Resources/Java' 카테고리의 글 목록 (2 Page)

Resources/Java

[Hadoop] 진행상황이 리포팅 되는 경우들 2010.04.06
[Hadoop] GeneralOptionParser, ToolRunner 클래스 옵션 2010.04.05
[HADOOP] MultipleOutputs :: 여러개로 출력하기 2010.03.24
[Java] jar 내용보기 2010.02.19
[링크] Java Persistence API 2010.02.18

[Hadoop] 진행상황이 리포팅 되는 경우들

gilbird 2010. 4. 6. 21:33

2010. 4. 6. 21:33

Mapper나 Reducer가 퍼센트 변화 없이 가만 있을 때가 있다.
아래 상황을 숙지하고 진행상황을 항상 알 수 있도록 하자.

진행 상황 관련 연산

입력 레코드 읽기 할 때
출력 레코드 쓰기 할 때
Reporter 클래스의 setStatus() 메서드로 상태를 설정 할 때
Reporter 클래스의 incrCounter() 메서드로 카운터를 증가시킬 때
Reporter 클래스 progress() 호출

'Resources > Java' 카테고리의 다른 글

이클립스에서 콘솔 컬러로 출력하기 (0)	2016.02.16
[Java/Groovy] 요일 구하기/체크하기 (0)	2010.05.07
구글 앱 엔진이란? (0)	2010.04.08
[Hadoop] GeneralOptionParser, ToolRunner 클래스 옵션 (0)	2010.04.05
[HADOOP] MultipleOutputs :: 여러개로 출력하기 (0)	2010.03.24
[Java] jar 내용보기 (0)	2010.02.19
[링크] Java Persistence API (0)	2010.02.18

[Hadoop] GeneralOptionParser, ToolRunner 클래스 옵션

gilbird 2010. 4. 5. 18:20

2010. 4. 5. 18:20

-D 프로퍼티=값

디폴트 값을 무시하고 지정한 프로퍼티 값을 설정함

-conf 파일명

설정에 사용할 파일 리스트에 추가
사이트 설정할 때 편리

-fs uri

디폴트 파일시스템 설정
-D fs.default.name=uri

-jt 호스트:포트

JobTracker 설정
-D mapred.job.tracker=호스트:포트

-files 파일1,파일2,…

로컬에 있는 파일을 HDFS에 복사
239페이지 참조

-archives 아카이브1,아카이브2,…

지정한 아카이브를 HDFS에 저장

-libjars jar1,jar2,…

로컬파일시스템의 jar를 HDFS에 복사
복사 후 MapReduce 태스크의 클래스패스에 추가

'Resources > Java' 카테고리의 다른 글

[Java/Groovy] 요일 구하기/체크하기 (0)	2010.05.07
구글 앱 엔진이란? (0)	2010.04.08
[Hadoop] 진행상황이 리포팅 되는 경우들 (0)	2010.04.06
[HADOOP] MultipleOutputs :: 여러개로 출력하기 (0)	2010.03.24
[Java] jar 내용보기 (0)	2010.02.19
[링크] Java Persistence API (0)	2010.02.18
[링크] JDO(Java Data Object)란? (0)	2010.02.18

[HADOOP] MultipleOutputs :: 여러개로 출력하기

gilbird 2010. 3. 24. 21:40

2010. 3. 24. 21:40

The MultipleOutputs class

simplifies writting to additional outputs other than the job default output
via the OutputCollector passed to the map() and reduce() methods of the Mapper and Reducer implementations.

Each additional output, or named output, may be configured with its own OutputFormat, with its own key class and with its own value class.

A named output can be a single file or a multi file. The later is refered as a multi named output.

A multi named output is an unbound set of files all sharing the same OutputFormat, key class and value class configuration.

When named outputs are used within a Mapper implementation, key/values written to a name output are not part of the reduce phase, only key/values written to the job OutputCollector are part of the reduce phase.

MultipleOutputs supports counters, by default the are disabled.
The counters group is the MultipleOutputs class name.

The names of the counters are the same as the named outputs.
For multi named outputs the name of the counter is the concatenation of the named output, and underscore '_' and the multiname.

Job configuration usage pattern is:

 JobConf conf = new JobConf();

 conf.setInputPath(inDir);
 FileOutputFormat.setOutputPath(conf, outDir);

 conf.setMapperClass(MOMap.class);
 conf.setReducerClass(MOReduce.class);
 ...

 // Defines additional single text based output 'text' for the job
 MultipleOutputs.addNamedOutput(conf, "text", TextOutputFormat.class,
     LongWritable.class, Text.class);

 // Defines additional multi sequencefile based output 'sequence' for the
 // job
 MultipleOutputs.addMultiNamedOutput(conf, "seq",
       SequenceFileOutputFormat.class,
       LongWritable.class, Text.class);
 ...

 JobClient jc = new JobClient();
 RunningJob job = jc.submitJob(conf);

 ...

Job configuration usage pattern is:

 public class MOReduce implements Reducer<WritableComparable, Writable> {
     private MultipleOutputs mos;

     public void configure(JobConf conf) {
         ...
         mos = new MultipleOutputs(conf);
     }

     public void reduce(WritableComparable key, Iterator<Writable> values,
         OutputCollector output, Reporter reporter)
         throws IOException {
         ...

         // 단일 출력
         mos.getCollector("text", reporter).collect(key, new Text("Hello"));

         // 다중 출력
         mos.getCollector("seq", "A", reporter).collect(key, new Text("Bye"));
         mos.getCollector("seq", "B", reporter).collect(key, new Text("Chau"));
         ...
     }

     public void close() throws IOException {
         mos.close();
         ...
     }

 }

참고

Hadoop 0.20.0 API 문서 – Class MultipleOutputs

'Resources > Java' 카테고리의 다른 글

구글 앱 엔진이란? (0)	2010.04.08
[Hadoop] 진행상황이 리포팅 되는 경우들 (0)	2010.04.06
[Hadoop] GeneralOptionParser, ToolRunner 클래스 옵션 (0)	2010.04.05
[Java] jar 내용보기 (0)	2010.02.19
[링크] Java Persistence API (0)	2010.02.18
[링크] JDO(Java Data Object)란? (0)	2010.02.18
Java Reverse Engineering Tools (0)	2010.02.08

[Java] jar 내용보기

gilbird 2010. 2. 19. 18:57

2010. 2. 19. 18:57

JAR 안의 파일 리스트를 보고 싶을 때가 있다. 이런 경우 jar에 tf 옵션을 주면 볼 수 있다.

jar –tf 파일명

'Resources > Java' 카테고리의 다른 글

[Hadoop] 진행상황이 리포팅 되는 경우들 (0)	2010.04.06
[Hadoop] GeneralOptionParser, ToolRunner 클래스 옵션 (0)	2010.04.05
[HADOOP] MultipleOutputs :: 여러개로 출력하기 (0)	2010.03.24
[링크] Java Persistence API (0)	2010.02.18
[링크] JDO(Java Data Object)란? (0)	2010.02.18
Java Reverse Engineering Tools (0)	2010.02.08
Google Web Toolkit (0)	2010.02.03

[링크] Java Persistence API

gilbird 2010. 2. 18. 07:30

2010. 2. 18. 07:30

http://java.sun.com/javaee/technologies/persistence.jsp

'Resources > Java' 카테고리의 다른 글

[Hadoop] 진행상황이 리포팅 되는 경우들 (0)	2010.04.06
[Hadoop] GeneralOptionParser, ToolRunner 클래스 옵션 (0)	2010.04.05
[HADOOP] MultipleOutputs :: 여러개로 출력하기 (0)	2010.03.24
[Java] jar 내용보기 (0)	2010.02.19
[링크] JDO(Java Data Object)란? (0)	2010.02.18
Java Reverse Engineering Tools (0)	2010.02.08
Google Web Toolkit (0)	2010.02.03

PREV 이전 1 2 3 NEXT 다음

IT Lab

Resources/Java

[Hadoop] 진행상황이 리포팅 되는 경우들

'Resources > Java' 카테고리의 다른 글

[Hadoop] GeneralOptionParser, ToolRunner 클래스 옵션

'Resources > Java' 카테고리의 다른 글

[HADOOP] MultipleOutputs :: 여러개로 출력하기

'Resources > Java' 카테고리의 다른 글

[Java] jar 내용보기

'Resources > Java' 카테고리의 다른 글

[링크] Java Persistence API

'Resources > Java' 카테고리의 다른 글

+ Recent posts

티스토리툴바