简体   繁体   中英

why does this Clojure code run out of memory?

I have a twenty million line, sorted text file. It has lots of duplicate lines. I have some Clojure code that figures out how many instances there are of each unique line, ie the output is something like:

alpha 20
beta 17
gamma 3
delta 4
...

The code works for smaller files, but on this larger one, it runs out of memory. What am I doing wrong? I assume that somewhere I am holding on to the head.

(require '[clojure.java.io :as io])

(def bi-grams (line-seq (io/reader "the-big-input-file.txt")))

(defn quick-process [input-list filename]
    (with-open [out (io/writer filename)] ;; e.g. "train/2gram-freq.txt"
        (binding [*out* out]
           (dorun (map (fn [[w v]] (println w "\t" (count v)))
                       (partition-by identity input-list)))

(quick-process bi-grams "output.txt")

Your bi-grams variable is holding on to the head of the line-seq .

Try (quick-process (line-seq (io/reader "the-big-input-file.txt")) "output.txt") .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM