e4npvw
Last Updated: February 25, 2016
·
5.014K
· runexec
81609937c20aeab3598aea56f1f4f022

Clojure: The real world of partitioning, interleaving, and regular expressions.

At first glance, the importance of the relationship between (re-seq), (partition), and (interleave) aren't very apparent.
<b>
This article is my attempt to show you the real world use of these functions in a statistical data collection program I wrote.</b>
*If you can't wait just skip to the bottom for the amazing part.*
endorse

What is (re-seq)?

(re-seq) allows us to get a lazy sequence of matched regular expressions.
Clojure runs on the JVM and leverages the use of Java Regular expressions.

Example:

points (map #(.replace
              (str (last %))
              "," "")
            (re-seq #"<span(.*)number(.*)>(\d+)"
                    data))
wins (map #(last %)
          (re-seq #"<td(.*)class(.*)wins green(.*)>(\d+)"
                  data))

What is (partition)?

(partition) allows us to split a collection into groups of N. N being a real number (not negative).

What is (interleave)?

(interleave) allows us to merge collections of the same count size while maintaining the order of all involved collections.

Example:

;; Every 7 items == 1 new collection
       stats (partition 7
                        (interleave regions
                                    characters
                                    points
                                    wins
                                    losses
                                    ratios
                                    divisions))

Real World: Starcraft 2 Statistical Data Collection

The following code will produce the following output (truncated for briefness).

ranks.core=> (page-search "vortex")
{:region AM,
 :url /us/3591970/vortex,
 :points 434,
 :wins 40,
 :losses 0,
 :ratio 100.00,
 :division_id 331288}
{:region EU,
 :url /ru/762577/ChaosVortex,
 :points 247,
 :wins 12,
 :losses 0,
 :ratio 100.00,
 :division_id 330278}
{:region KR/TW,
 :url /kr/4032602/Vortex,
 :points 173,
 :wins 7,
 :losses 0,
 :ratio 100.00,
 :division_id 324826}

endorse

The code

(defstruct sc-stat
  :region
  :url
  :points
  :wins
  :losses
  :ratio
  :division_id)

(defn page-search [name]
  (let
      [url (str "http://site.com/"
                name)
       data (:body (client/get url))
       regions (map #(last %)
                    (re-seq #"<td(.*)class(.*)region(.*)>(.*)</td>"
                            data))
       characters (map #(last %)
                    (re-seq #"<td(.*)class(.*)character0(.*)href=\"(.*)\""
                            data))
       points (map #(.replace
                     (str (last %))
                     "," "")
                   (re-seq #"<span(.*)number(.*)>(\d+)"
                           data))
       wins (map #(last %)
                 (re-seq #"<td(.*)class(.*)wins green(.*)>(\d+)"
                         data))
       losses (map #(last %)
                   (re-seq #"<td(.*)class(.*)losses red(.*)>(\d+)"
                           data))
       ratios (map #(apply str (drop-last
                                (last %)))
                 (re-seq #"<td(.*)class(.*)ratio(.*)>(\S+)<"
                         data))
       divisions (map #(last %)
                      (re-seq #"/div/(\d+)"
                              data))
       stats (partition 7
                        (interleave regions
                                    characters
                                    points
                                    wins
                                    losses
                                    ratios
                                    divisions))
       db-stats (map #(struct sc-stat
                              (nth % 0)
                              (.replace (str
                                         (nth % 1))
                                         "'"
                                         "")
                              (nth % 2)
                              (nth % 3)
                              (nth % 4)
                              (nth % 5)
                              (nth % 6))
                     stats)]
    (jdbc/with-connection db
      (doseq [s db-stats]
        (try
          (comment
          (println
           (jdbc/update-or-insert-values
            :scusers
            ["url=?" (:url s)]
            s) s))
          (println s)
          (catch Exception e
            (println "Warning: "
                     (.getMessage e))))))))
Say Thanks
Respond

1 Response
Add your response

11495
6165aecb1a97c3ee6df433e38b49ca37

Thanks for this. I highly recommend https://github.com/cgrand/enlive for the screenscraping bit!

over 1 year ago ·