인코딩이 문제일 수 있다.

eclipse.ini에 다음 라인을 추가한다.


-Dfile.encoding=UTF8



파일 저장 후에 이클립스를 재시작하면 잘 나온다.

.scala 파일 맨 앞줄에 shebang을 넣어준다.


  1 #!/usr/bin/env scala

  2

  3 println("hello")


파일 속성을 변경해준다.


$ chmod a+x test.scala


이제 바로 실행이 된다.


$ ./test.scala

hello



spark-shell 사용중에 saveAsTextFile을 사용하기전에 제약사항이 저장할 디렉토리에 파일이 없어야 한다.

따라서 미리 파일을 지워야 하는데 s3같은 경우는 hadoop fs -rmr 커맨드 한줄로 가능하다.

spark-shell 안에서 hadoop 커맨드를 실행하고자 하는 경우 다음과 같이 하면 된다.


scala> import sys.process._

import sys.process._


scala> "hadoop fs -rmr s3://버킷명/지우고_싶은_디렉토리" !



AWS S3 Java SDK를 써도 되긴하는데 귀찮다. 

참고
* http://alvinalexander.com/scala/scala-execute-exec-external-system-commands-in-scala


'Spark' 카테고리의 다른 글

임시 테이블 만들기  (0) 2017.01.23
spark-shell 사용법  (0) 2016.09.30
spark-shell에서 scala 버전 구하기  (0) 2016.09.29
Date에 Range를 넣어보자.  (0) 2016.02.11


$ spark-shell --packages "io.lamma:lamma_2.11:2.3.0"

Ivy Default Cache set to: /home/hadoop/.ivy2/cache

The jars for the packages stored in: /home/hadoop/.ivy2/jars

:: loading settings :: url = jar:file:/usr/lib/spark/lib/spark-assembly-1.6.0-hadoop2.7.1-amzn-0.jar!/org/apache/ivy/core/settings/ivysettings.xml

joda-time#joda-time added as a dependency

io.lamma#lamma_2.11 added as a dependency

:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0

        confs: [default]

        found joda-time#joda-time;2.9.2 in central

        found io.lamma#lamma_2.11;2.3.0 in central

downloading https://repo1.maven.org/maven2/io/lamma/lamma_2.11/2.3.0/lamma_2.11-2.3.0.jar ...

        [SUCCESSFUL ] io.lamma#lamma_2.11;2.3.0!lamma_2.11.jar (68ms)

:: resolution report :: resolve 1799ms :: artifacts dl 78ms

        :: modules in use:

        io.lamma#lamma_2.11;2.3.0 from central in [default]

        joda-time#joda-time;2.9.2 from central in [default]

        ---------------------------------------------------------------------

        |                  |            modules            ||   artifacts   |

        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|

        ---------------------------------------------------------------------

        |      default     |   2   |   1   |   1   |   0   ||   2   |   1   |

        ---------------------------------------------------------------------

:: retrieving :: org.apache.spark#spark-submit-parent

        confs: [default]

        1 artifacts copied, 1 already retrieved (338kB/13ms)

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0

      /_/


Using Scala version 2.10.5 (OpenJDK 64-Bit Server VM, Java 1.7.0_91)

Type in expressions to have them evaluated.

Type :help for more information.

16/02/11 05:49:39 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.

Spark context available as sc.

Thu Feb 11 05:49:54 UTC 2016 Thread[main,5,main] java.io.FileNotFoundException: derby.log (Permission denied)

----------------------------------------------------------------

Thu Feb 11 05:49:55 UTC 2016:

Booting Derby version The Apache Software Foundation - Apache Derby - 10.10.1.1 - (1458268): instance a816c00e-0152-cee0-f31c-000025557508

on database directory /mnt/tmp/spark-9e5a20ed-acea-4dc8-ab4b-5803c4a3dca9/metastore with class loader sun.misc.Launcher$AppClassLoader@753d556f

Loaded from file:/usr/lib/spark/lib/spark-assembly-1.6.0-hadoop2.7.1-amzn-0.jar

java.vendor=Oracle Corporation

java.runtime.version=1.7.0_91-mockbuild_2015_10_27_19_01-b00

user.dir=/etc/spark/conf.dist

os.name=Linux

os.arch=amd64

os.version=4.1.13-19.30.amzn1.x86_64

derby.system.home=null

Database Class Loader started - derby.database.classpath=''

16/02/11 05:50:04 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0

16/02/11 05:50:04 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException

SQL context available as sqlContext.


scala> import io.lamma.Date

import io.lamma.Date


scala> Date("2016-02-01") to Date("2016-02-11") foreach println

Date(2016,2,1)

Date(2016,2,2)

Date(2016,2,3)

Date(2016,2,4)

Date(2016,2,5)

Date(2016,2,6)

Date(2016,2,7)

Date(2016,2,8)

Date(2016,2,9)

Date(2016,2,10)

Date(2016,2,11)


scala>



sdk manager를 이용하여 scala를 간단하게 설치해보자.


먼저 sdkman을 설치한다.


$ curl -s get.sdkman.io | bash


그 다음 새창을 열거나 다음 커맨드로 sdkman을 실행가능하도록 한다.

$ source "$HOME/.sdkman/bin/sdkman-init.sh"
다음 커맨드로 scala를 설치하면 작업 완료이다.

[~]# sdk install scala

==== BROADCAST =================================================================

* 08/02/16: Gradle 2.11 released on SDKMAN! #gradle

* 06/02/16: Vertx 3.2.1 released on SDKMAN! #vertx

* 04/02/16: Kotlin 1.0.0-rc-1036 released on SDKMAN! #kotlin

================================================================================


Downloading: scala 2.11.7


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0

100 27.1M  100 27.1M    0     0   360k      0  0:01:17  0:01:17 --:--:--  344k


Installing: scala 2.11.7

Done installing!


Do you want scala 2.11.7 to be set as default? (Y/n):  y


Setting scala 2.11.7 as default.

[~]# which scala

/Users/gilbird/.sdkman/candidates/scala/current/bin/scala





다음 사이트 방문하여 라이브러리 다운로드


http://www.scala-sbt.org/download.html


압축을 풀고 bin디렉토리로 디동하여 sbt 실행하니 관련 라이브러리를 오랜 시간 받음


[~/gilbird/scala/sbt/bin]# ./sbt

Getting org.scala-sbt sbt 0.13.9 ...

downloading https://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/sbt/0.13.9/jars/sbt.jar ...

[SUCCESSFUL ] org.scala-sbt#sbt;0.13.9!sbt.jar (4724ms)

downloading https://jcenter.bintray.com/org/scala-lang/scala-library/2.10.5/scala-library-2.10.5.jar ...

[SUCCESSFUL ] org.scala-lang#scala-library;2.10.5!scala-library.jar (2985ms)

downloading https://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/main/0.13.9/jars/main.jar ...

[SUCCESSFUL ] org.scala-sbt#main;0.13.9!main.jar (9263ms)

downloading https://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/compiler-interface/0.13.9/jars/compiler-interface-bin.jar ...

[SUCCESSFUL ] org.scala-sbt#compiler-interface;0.13.9!compiler-interface-bin.jar (6848ms)

downloading https://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/compiler-interface/0.13.9/jars/compiler-interface-src.jar ...

[SUCCESSFUL ] org.scala-sbt#compiler-interface;0.13.9!compiler-interface-src.jar (4634ms)

downloading https://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/precompiled-2_8_2/0.13.9/jars/compiler-interface-bin.jar ...

[SUCCESSFUL ] org.scala-sbt#precompiled-2_8_2;0.13.9!compiler-interface-bin.jar (6416ms)

downloading https://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/precompiled-2_9_2/0.13.9/jars/compiler-interface-bin.jar ...

[SUCCESSFUL ] org.scala-sbt#precompiled-2_9_2;0.13.9!compiler-interface-bin.jar (5475ms)

...



다시 실행하면 관련 라이브러리는 이미 다 받았으니 바로 실행되는 것 확인

현재 경로를 .profile에 등록하여 다른 디렉토리에서도 실행 가능하도록 함.


export SBT_HOME=[전체경로]/scala/sbt

export PATH=$SBT_HOME/bin:$PATH




스칼라 쉘에서 멀티라인을 입력하려면 다음과 같이 하면 된다.

  1. :paste 입력 (:p도 가능)
  2. 멀티라인 입력
  3. Ctrl + D 로 입력 종료


참고링크: http://alvinalexander.com/scala/how-to-enter-paste-multiline-commands-statements-into-scala-repl


+ Recent posts