在從文件中讀取時，在clojure中分割線條

Question

我正在學校學習clojure，我正在考試。 我只是在做一些事情，以確保我掌握它。

我試圖逐行讀取文件，就像我一樣，只要有“;”就想分割行。

到目前為止，這是我的代碼

(defn readFile []
  (map (fn [line] (clojure.string/split line #";"))
  (with-open [rdr (reader "C:/Users/Rohil/Documents/work.txt.txt")]
    (doseq [line (line-seq rdr)]
      (clojure.string/split line #";")
        (println line)))))

當我這樣做時，我仍然得到輸出：

"I;Am;A;String;"

我錯過了什么嗎？

Answer 1

我不確定你是否在學校需要這個，但由於加里已經給出了一個很好的答案，所以認為這是一個獎勵。

您可以使用傳感器對文本行進行優雅的轉換。 您需要的成分是允許您將線條視為可簡化的集合，並在您完成減少后關閉閱讀器：

(defn lines-reducible [^BufferedReader rdr]
  (reify clojure.lang.IReduceInit
    (reduce [this f init]
      (try
        (loop [state init]
          (if (reduced? state)
            @state
            (if-let [line (.readLine rdr)]
              (recur (f state line))
              state)))
        (finally
          (.close rdr))))))

現在，您可以執行以下操作，給定輸入work.txt ：

I;am;a;string
Next;line;please

計算每個“分裂”的長度

(require '[clojure.string :as str])
(require '[clojure.java.io :as io])

(into []
      (comp
       (mapcat #(str/split % #";"))
       (map count))
      (lines-reducible (io/reader "/tmp/work.txt")))
;;=> [1 2 1 6 4 4 6]

總結所有“分裂”的長度

(transduce
 (comp
  (mapcat #(str/split % #";"))
  (map count))
 +
 (lines-reducible (io/reader "/tmp/work.txt")))
;;=> 24

將所有單詞的長度相加，直到找到長於5的單詞

(transduce
 (comp
  (mapcat #(str/split % #";"))
  (map count))
 (fn
   ([] 0)
   ([sum] sum)
   ([sum l]
    (if (> l 5)
      (reduced sum)
      (+ sum l))))
 (lines-reducible (io/reader "/tmp/work.txt")))

或與take-while ：

(transduce
 (comp
  (mapcat #(str/split % #";"))
  (map count)
  (take-while #(> 5 %)))
 +
 (lines-reducible (io/reader "/tmp/work.txt")))

有關詳細信息，請閱讀https://tech.grammarly.com/blog/building-etl-pipelines-with-clojure 。

Answer 2

TL; DR擁抱REPL並擁抱不變性

你的問題是“我錯過了什么？” 而且我會說你錯過了Clojure的最佳功能之一，即REPL。

編輯： 您可能也會遺漏Clojure使用不可變數據結構

考慮以下代碼段：

(doseq [x [1 2 3]]
   (inc x)
   (prn x))

此代碼不打印“2 3 4”

它打印“1 2 3”因為x不是可變變量。

在第一次迭代(inc x)被調用期間，返回2，並且由於它沒有被傳遞給任何東西而被拋棄，然后(prn x)打印(prn x)的值，它仍然是1。

現在考慮以下代碼段：

(doseq [x [1 2 3]] (prn (inc x)))

在第一次迭代期間，inc將其返回值傳遞給prn，因此得到2

很長的例子：

我不想剝奪你自己解決問題的機會所以我會用另一個問題作為例子。

給定文件"birds.txt" ，數據為"1chicken\\n 2duck\\n 3Larry"你想寫一個函數，它接受一個文件並返回一系列鳥名

讓我們把這個問題分解成更小的塊：

首先讓我們讀取文件並將其拆分成行

(slurp "birds.txt")會給整個文件一個字符串

clojure.string/split-lines將為我們提供一個集合，每行作為集合中的元素

(clojure.string/split-lines (slurp "birds.txt"))讓我們["1chicken" "2duck" "3Larry"]

此時我們可以在該集合上映射一些函數來(map #(clojure.string/replace % #"\\d" "") birds-collection)像這樣的數字(map #(clojure.string/replace % #"\\d" "") birds-collection)

或者我們可以在整個文件是一個字符串時將該步驟向上移動。

現在我們已經擁有了所有的部分，我們可以將它們放在一個功能管道中，其中一個部分的結果將輸入到下一個部分

在Clojure中有一個很好的宏來使它更具可讀性->宏

它接受一次計算的結果並將其作為第一個參數注入下一個計算

所以我們的管道看起來像這樣：

(-> "C:/birds.txt"
     slurp
     (clojure.string/replace #"\d" "") 
     clojure.string/split-lines)

關於樣式的最后一點，對於想要堅持kebab案例的 Clojure函數，所以readFile應該是read-file

Answer 3

我會保持簡單，並像這樣編碼：

(ns tst.demo.core
  (:use tupelo.test)
  (:require [tupelo.core :as t]
            [clojure.string :as str] ))
(def text
 "I;am;a;line;
  This;is;another;one
  Followed;by;this;")

(def tmp-file-name "/tmp/lines.txt")

(dotest
  (spit tmp-file-name text) ; write it to a tmp file
  (let [lines       (str/split-lines (slurp tmp-file-name))
        result      (for [line lines]
                      (for [word (str/split line #";")]
                        (str/trim word)))
        result-flat (flatten result)]
(is= result
  [["I" "am" "a" "line"]
   ["This" "is" "another" "one"]
   ["Followed" "by" "this"]])

請注意， result是雙重嵌套（2D）單詞矩陣。 解除此問題的最簡單方法是生成result-flat的flatten函數：

(is= result-flat
  ["I" "am" "a" "line" "This" "is" "another" "one" "Followed" "by" "this"])))

你也可以使用apply concat如：

(is= (apply concat result) result-flat)

如果你想避免在第一時間建立一個二維矩陣，可以使用generator function通過（一拉Python）的lazy-gen和yield 從圖珀洛庫：

(dotest
  (spit tmp-file-name text) ; write it to a tmp file
  (let [lines  (str/split-lines (slurp tmp-file-name))
        result (t/lazy-gen
                 (doseq [line lines]
                   (let [words (str/split line #";")]
                     (doseq [word words]
                       (t/yield (str/trim word))))))]

(is= result
  ["I" "am" "a" "line" "This" "is" "another" "one" "Followed" "by" "this"])))

在這種情況下， lazy-gen創建生成器函數。 請注意， for已被doseq替換， yield函數將每個單詞放入輸出惰性序列中。

在從文件中讀取時，在clojure中分割線條

問題描述

3 個解決方案

解決方案1
11 2017-11-17 15:43:26

解決方案2
8 2017-11-16 17:30:27

解決方案3
-1 2017-11-17 17:10:25

在從文件中讀取時，在clojure中分割線條

問題描述

3 個解決方案

解決方案1 11 2017-11-17 15:43:26

解決方案2 8 2017-11-16 17:30:27

解決方案3 -1 2017-11-17 17:10:25

解決方案1
11 2017-11-17 15:43:26

解決方案2
8 2017-11-16 17:30:27

解決方案3
-1 2017-11-17 17:10:25