'Spark' 카테고리의 글 목록

Spark

임시 테이블 만들기 2017.01.23
spark-shell 사용법 2016.09.30
spark-shell에서 scala 버전 구하기 2016.09.29
spark-shell에서 s3 디렉토리를 지워야 하는 경우 external system command 사용하기 2016.02.13
Date에 Range를 넣어보자. 2016.02.11

임시 테이블 만들기

gilbird 2017. 1. 23. 09:07

2017. 1. 23. 09:07

참고: http://spark.apache.org/docs/1.6.2/sql-programming-guide.html

// sc is an existing SparkContext.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._

// Define the schema using a case class.
// Note: Case classes in Scala 2.10 can support only up to 22 fields. To work around this limit,
// you can use custom classes that implement the Product interface.
case class Person(name: String, age: Int)

// Create an RDD of Person objects and register it as a table.
val people = sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt)).toDF()
people.registerTempTable("people")

// SQL statements can be run by using the sql methods provided by sqlContext.
val teenagers = sqlContext.sql("SELECT name, age FROM people WHERE age >= 13 AND age <= 19")

// The results of SQL queries are DataFrames and support all the normal RDD operations.
// The columns of a row in the result can be accessed by field index:
teenagers.map(t => "Name: " + t(0)).collect().foreach(println)

// or by field name:
teenagers.map(t => "Name: " + t.getAs[String]("name")).collect().foreach(println)

// row.getValuesMap[T] retrieves multiple columns at once into a Map[String, T]
teenagers.map(_.getValuesMap[Any](List("name", "age"))).collect().foreach(println)
// Map("name" -> "Justin", "age" -> 19)

'Spark' 카테고리의 다른 글

spark-shell 사용법 (0)	2016.09.30
spark-shell에서 scala 버전 구하기 (0)	2016.09.29
spark-shell에서 s3 디렉토리를 지워야 하는 경우 external system command 사용하기 (0)	2016.02.13
Date에 Range를 넣어보자. (0)	2016.02.11

spark-shell 사용법

gilbird 2016. 9. 30. 09:07

2016. 9. 30. 09:07

jar 추가 하기

spark-shell --jars <jar_name1,jar_name2, ...>

'Spark' 카테고리의 다른 글

임시 테이블 만들기 (0)	2017.01.23
spark-shell에서 scala 버전 구하기 (0)	2016.09.29
spark-shell에서 s3 디렉토리를 지워야 하는 경우 external system command 사용하기 (0)	2016.02.13
Date에 Range를 넣어보자. (0)	2016.02.11

spark-shell에서 scala 버전 구하기

gilbird 2016. 9. 29. 17:23

2016. 9. 29. 17:23

repl에서와 동일하게 다음과 같이 하면 버전이 나온다.

scala> scala.util.Properties.versionString

res0: String = version 2.10.5

'Spark' 카테고리의 다른 글

임시 테이블 만들기 (0)	2017.01.23
spark-shell 사용법 (0)	2016.09.30
spark-shell에서 s3 디렉토리를 지워야 하는 경우 external system command 사용하기 (0)	2016.02.13
Date에 Range를 넣어보자. (0)	2016.02.11

spark-shell에서 s3 디렉토리를 지워야 하는 경우 external system command 사용하기

gilbird 2016. 2. 13. 17:20

2016. 2. 13. 17:20

spark-shell 사용중에 saveAsTextFile을 사용하기전에 제약사항이 저장할 디렉토리에 파일이 없어야 한다.

따라서 미리 파일을 지워야 하는데 s3같은 경우는 hadoop fs -rmr 커맨드 한줄로 가능하다.

spark-shell 안에서 hadoop 커맨드를 실행하고자 하는 경우 다음과 같이 하면 된다.

scala> import sys.process._

import sys.process._

scala> "hadoop fs -rmr s3://버킷명/지우고_싶은_디렉토리" !

AWS S3 Java SDK를 써도 되긴하는데 귀찮다.

참고

* http://alvinalexander.com/scala/scala-execute-exec-external-system-commands-in-scala

'Spark' 카테고리의 다른 글

임시 테이블 만들기 (0)	2017.01.23
spark-shell 사용법 (0)	2016.09.30
spark-shell에서 scala 버전 구하기 (0)	2016.09.29
Date에 Range를 넣어보자. (0)	2016.02.11

Date에 Range를 넣어보자.

gilbird 2016. 2. 11. 14:52

2016. 2. 11. 14:52

$ spark-shell --packages "io.lamma:lamma_2.11:2.3.0"

Ivy Default Cache set to: /home/hadoop/.ivy2/cache

The jars for the packages stored in: /home/hadoop/.ivy2/jars

:: loading settings :: url = jar:file:/usr/lib/spark/lib/spark-assembly-1.6.0-hadoop2.7.1-amzn-0.jar!/org/apache/ivy/core/settings/ivysettings.xml

joda-time#joda-time added as a dependency

io.lamma#lamma_2.11 added as a dependency

:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0

confs: [default]

found joda-time#joda-time;2.9.2 in central

found io.lamma#lamma_2.11;2.3.0 in central

downloading https://repo1.maven.org/maven2/io/lamma/lamma_2.11/2.3.0/lamma_2.11-2.3.0.jar ...

[SUCCESSFUL ] io.lamma#lamma_2.11;2.3.0!lamma_2.11.jar (68ms)

:: resolution report :: resolve 1799ms :: artifacts dl 78ms

:: modules in use:

io.lamma#lamma_2.11;2.3.0 from central in [default]

joda-time#joda-time;2.9.2 from central in [default]

---------------------------------------------------------------------

| | modules || artifacts |

| conf | number| search|dwnlded|evicted|| number|dwnlded|

---------------------------------------------------------------------

| default | 2 | 1 | 1 | 0 || 2 | 1 |

---------------------------------------------------------------------

:: retrieving :: org.apache.spark#spark-submit-parent

confs: [default]

1 artifacts copied, 1 already retrieved (338kB/13ms)

Welcome to

____ __

/ __/__ ___ _____/ /__

_\ \/ _ \/ _ `/ __/ '_/

/___/ .__/\_,_/_/ /_/\_\ version 1.6.0

/_/

Using Scala version 2.10.5 (OpenJDK 64-Bit Server VM, Java 1.7.0_91)

Type in expressions to have them evaluated.

Type :help for more information.

16/02/11 05:49:39 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.

Spark context available as sc.

Thu Feb 11 05:49:54 UTC 2016 Thread[main,5,main] java.io.FileNotFoundException: derby.log (Permission denied)

----------------------------------------------------------------

Thu Feb 11 05:49:55 UTC 2016:

Booting Derby version The Apache Software Foundation - Apache Derby - 10.10.1.1 - (1458268): instance a816c00e-0152-cee0-f31c-000025557508

on database directory /mnt/tmp/spark-9e5a20ed-acea-4dc8-ab4b-5803c4a3dca9/metastore with class loader sun.misc.Launcher$AppClassLoader@753d556f

Loaded from file:/usr/lib/spark/lib/spark-assembly-1.6.0-hadoop2.7.1-amzn-0.jar

java.vendor=Oracle Corporation

java.runtime.version=1.7.0_91-mockbuild_2015_10_27_19_01-b00

user.dir=/etc/spark/conf.dist

os.name=Linux

os.arch=amd64

os.version=4.1.13-19.30.amzn1.x86_64

derby.system.home=null

Database Class Loader started - derby.database.classpath=''

16/02/11 05:50:04 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0

16/02/11 05:50:04 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException

SQL context available as sqlContext.

scala> import io.lamma.Date

import io.lamma.Date

scala> Date("2016-02-01") to Date("2016-02-11") foreach println

Date(2016,2,1)

Date(2016,2,2)

Date(2016,2,3)

Date(2016,2,4)

Date(2016,2,5)

Date(2016,2,6)

Date(2016,2,7)

Date(2016,2,8)

Date(2016,2,9)

Date(2016,2,10)

Date(2016,2,11)

scala>

'Spark' 카테고리의 다른 글

임시 테이블 만들기 (0)	2017.01.23
spark-shell 사용법 (0)	2016.09.30
spark-shell에서 scala 버전 구하기 (0)	2016.09.29
spark-shell에서 s3 디렉토리를 지워야 하는 경우 external system command 사용하기 (0)	2016.02.13

PREV 이전 1 NEXT 다음

IT Lab

Spark

임시 테이블 만들기

'Spark' 카테고리의 다른 글

spark-shell 사용법

'Spark' 카테고리의 다른 글

spark-shell에서 scala 버전 구하기

'Spark' 카테고리의 다른 글

spark-shell에서 s3 디렉토리를 지워야 하는 경우 external system command 사용하기

'Spark' 카테고리의 다른 글

Date에 Range를 넣어보자.

'Spark' 카테고리의 다른 글

+ Recent posts

티스토리툴바