简体   繁体   English

如何解析和使用Clojure中的表?

[英]How do I parse and use a table in Clojure?

I have started started my journey into using clojure, and got stumped by the first problem I set for myself. 我已经开始使用Clojure了,并为我为自己设置的第一个问题感到困惑。 I have a text file, that is basically a table nXm rows/columns. 我有一个文本文件,基本上是一个表nXm行/列。 The first row is column names and first column is row names. 第一行是列名,第一列是行名。 I want to be able to parse this table using clojure and later query table[row][column] and get that value. 我希望能够使用clojure和稍后的查询表[row] [column]来解析此表并获取该值。

  a  b  c
1 7  8  9
2 s  q  r
3 2  7  1

So, how would I consume the above table in clojure? 那么,我如何在clojure中使用上表呢? I am not really sure where to start. 我不确定从哪里开始。 Can someone get me going in the right direction? 有人能让我朝着正确的方向前进吗?

@Hendekagon's answer is a good way to get the job done, but we can look at a from-scratch implementation. @Hendekagon的答案是完成工作的好方法,但是我们可以从头开始实现。 Though probably not the best solution, hopefully the sample design helps get you under way. 尽管可能不是最佳解决方案,但希望示例设计可以帮助您顺利进行。

If you want to query your structure, in Clojure, you're going to be thinking about maps. 如果你想查询你的结构,在Clojure中,你将会考虑地图。 Let's take as our goal something that looks like this: 让我们把目标看作是我们的目标:

{"1" {"a" "7", "b" "8", "c" "9"},
 "2" {"a" "s", "b" "q", "c" "r"},
 "3" {"a" "2", "b" "7", "c" "1"}}

Here, row names are keys into to maps of column names to table elements. 在此,行名是将列名映射到表元素的键。 With this structure, we can easily query an element of the table using get-in . 通过这种结构,我们可以使用get-in轻松查询表的元素。

(get-in table ["2" "b"]) ; => "q"

Okay. 好的。 How do we do it? 我们该怎么做呢?

Let's pretend for a second we've already read in our file and have it as a string. 让我们假装一秒钟,我们已经在我们的文件中读取并将其作为字符串。 Then, we need to transform it into our map-of-maps. 然后,我们需要将其转换为地图。 Our function's going to look something like this: 我们的功能看起来像这样:

(defn parse-table
  [raw-table-data]
  ...)

The first step is to pull out all of the important bits of data - the row names, the column names, and the table elements. 第一步是提取所有重要的数据位-行名,列名和表元素。 However, before we can grab them, we need to parse the raw-table-data string into a structure more easily traversed. 但是,在获取它们之前,我们需要将raw-table-data字符串解析为更易于遍历的结构。 We'll split the string on newlines, then tokenize the lines on whitespace using a helper function tokens . 我们将在换行符上分割字符串,然后使用辅助函数tokens在空白符上标记行。

(use '[clojure.string :only [split split-lines trim]])

(defn tokens
  [s]
  (-> s trim (split #"\s+")))

(defn parse-table
  [raw-table-data]
  (let [table-data (map tokens (split-lines raw-table-data))]
    ...
)

table-data looks something like this: table-data看起来像这样:

 [["a", "b", "c"],
  ["1", "7", "8", "9"],
  ["2", "s", "q", "r"],
  ["3", "2", "7", "1"]]

This makes it easy to get to the good stuff: 这使得很容易找到好东西:

(defn parse-table
  [raw-table-data]
  (let [table-data (map tokens (split-lines raw-table-data))
        column-names (first table-data)
        row-names (map first (next table-data))
        contents (map next (next table-data))]
    ...
)

With the data teased out, we just need to stitch it together. 整理好数据之后,我们只需要将它们缝合在一起即可。 An easy way to do this is to build all of our individual mappings of row-to-column-to-elements and then combine them. 一种简单的方法是构建我们所有的行到列到元素的单独映射,然后将它们组合起来。 I'll mention that this isn't the most efficient way, but it's pretty clean. 我会提到这不是最有效的方法,但是很干净。

Creating a helper function pairs that simply sticks elements of two collections side-by-side, we can get a sequence of mappings using a for comprehension. 创建一个简单地并排粘贴两个集合的元素的辅助函数pairs ,我们可以使用for理解获得一系列映射。

(defn pairs
  [coll1 coll2]
  (map vector coll1 coll2))

(for [[row-name row-contents] (pairs row-names contents)
      [column-name element] (pairs column-names row-contents)]
  {row-name {column-name element}})

This gives a sequence of maps-to-maps. 这给出了一系列地图到地图。 We just need to merge it into one big map and the function is complete. 我们只需要将其合并为一张大地图,即可完成该功能。

(defn parse-table
  [raw-table-data]
  (let [table-data (map tokens (split-lines raw-table-data))
        column-names (first table-data)
        row-names (map first (next table-data))
        contents (map next (next table-data))]
    (apply merge-with merge
      (for [[row-name row-contents] (pairs row-names contents)
            [column-name element] (pairs column-names row-contents)]
        {row-name {column-name element}}))))

Now, we can slurp up a table file and parse it. 现在,我们可以对表文件进行处理并对其进行解析。

(def table
  (->
    "file"
    slurp
    parse-table))

This gets us to our goal. 这让我们达到了目标。

(println (get-in table ["2" "b"])) ; => "q"

Use https://github.com/clojure/data.csv , your file will become a sequence of vectors, each being a row, you can then parse the rows with a function like this: 使用https://github.com/clojure/data.csv ,您的文件将成为一系列向量,每个向量都是一行,然后可以使用如下函数解析这些行:

(defn parse-row [[a b c]]
 [(Integer/parseInt a) (Double/parseDouble b) (str c)])

(note the destructuring in the argument list, this makes it easier to read the column names) (请注意参数列表中的解构,这使读取列名称更加容易)

then (map parse-row rows) to get the parsed table 然后(map parse-row rows)以获取已分析的表

But, another way is to use Incanter , which will turn your csv file into a matrix which will be easier to query. 但是,另一种方法是使用Incanter ,它将把您的csv文件转换成一个矩阵,该矩阵更易于查询。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM