Last Updated: February 25, 2016
·
956
· chris_betz

Using Apache Spark from Clojure

Here's a small sample of how to process big data (with a small sample) from Clojure using Apache Spark and the Sparkling library:

(do
  (require '[sparkling.conf :as conf])
  (require '[sparkling.core :as spark])
  (spark/with-context     ; this creates a spark context from the given config
    sc
    (-> (conf/spark-conf)
        (conf/app-name "sparkling-test")
        (conf/master "local"))
    (let [lines-rdd
          ;; here we provide data from a clojure collection.
          ;; You could also read from a text file, or avro file.
          ;; You could even approach a JDBC datasource
          (spark/into-rdd sc ["This is a first line"
                              "Testing spark"
                              "and sparkling"
                              "Happy hacking!"])]
         (spark/collect             ; get every element from the filtered RDD
           (spark/filter            ; filter elements in the given RDD (lines-rdd)
             #(.contains % "spark") ; a pure clojure function as filter predicate
             lines-rdd)))))