简体   繁体   English

Clojure中的数据库函数式编程

[英]Database Functional Programming in Clojure

"It is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail." “如果你拥有的唯一工具是锤子,那就很诱人,把所有东西看作是钉子。” - Abraham Maslow - 亚伯拉罕·马斯洛

I need to write a tool to dump a large hierarchical (SQL) database to XML. 我需要编写一个工具来将大型分层(SQL)数据库转储到XML。 The hierarchy consists of a Person table with subsidiary Address , Phone , etc. tables. 层次结构由具有辅助AddressPhone等表的Person表组成。

  • I have to dump thousands of rows, so I would like to do so incrementally and not keep the whole XML file in memory. 我必须转储数千行,所以我想逐步这样做,而不是将整个XML文件保存在内存中。

  • I would like to isolate non-pure function code to a small portion of the application. 我想将非纯函数代码隔离到应用程序的一小部分。

  • I am thinking that this might be a good opportunity to explore FP and concurrency in Clojure. 我认为这可能是在Clojure中探索FP和并发性的好机会。 I can also show the benefits of immutable data and multi-core utilization to my skeptical co-workers. 我还可以向持怀疑态度的同事展示不可变数据和多核利用的好处。

I'm not sure how the overall architecture of the application should be. 我不确定应用程序的整体架构应该如何。 I am thinking that I can use an impure function to retrieve the database rows and return a lazy sequence that can then be processed by a pure function that returns an XML fragment. 我想我可以使用一个不纯的函数来检索数据库行并返回一个惰性序列,然后可以由返回XML片段的纯函数处理。

For each Person row, I can create a Future and have several processed in parallel (the output order does not matter). 对于每个Person行,我可以创建一个Future并且有几个并行处理(输出顺序无关紧要)。

As each Person is processed, the task will retrieve the appropriate rows from the Address , Phone , etc. tables and generate the nested XML. 在处理每个Person ,任务将从AddressPhone等表中检索适当的行并生成嵌套的XML。

I can use aa generic function to process most of the tables, relying on database meta-data to get the column information, with special functions for the few tables that need custom processing. 我可以使用通用函数来处理大多数表,依靠数据库元数据来获取列信息,并为需要自定义处理的少数表提供特殊功能。 These functions could be listed in a map(table name -> function) . 这些函数可以在map(table name -> function)列出map(table name -> function)

Am I going about this in the right way? 我是否以正确的方式解决这个问题? I can easily fall back to doing it in OO using Java, but that would be no fun. 我可以轻松地使用Java在OO中执行此操作,但这并不好玩。

BTW, are there any good books on FP patterns or architecture? 顺便说一句,有没有关于FP模式或架构的好书? I have several good books on Clojure, Scala, and F#, but although each covers the language well, none look at the "big picture" of function programming design. 我有几本关于Clojure,Scala和F#的好书,虽然每个都很好地涵盖了语言,但没有人看过功能编程设计的“大图”。

Ok, cool, you're using this as an opportunity to showcase Clojure. 好吧,很酷,你用这个作为展示Clojure的机会。 So, you want to demonstrate FP and concurrency. 所以,你想要演示FP和并发性。 Roger that. 收到。

To wow your interlocutors I would make a point to demonstrate: 为了让你的对话者惊叹,我想说明一下:

  • Performance of your program using a single thread. 使用单个线程执行程序的性能。
  • How your program's performance increases as you increase the number of threads. 随着线程数量的增加,程序的性能如何提高。
  • How easy it is to take your program from single to multi-threaded. 将程序从单线程变为多线程是多么容易。

You might create a function to dump a single table to an XML file. 您可以创建一个函数将单个表转储到XML文件。

(defn table-to-xml [name] ...)

With that you can work out all or your code for the core task of converting your relational data to XML. 有了它,您可以计算出将关系数据转换为XML的核心任务的全部或代码。

Now that you've solved the core problem see if throwing more threads at it will increase your speed. 现在你已经解决了核心问题,看看是否会增加更多的线程速度。

You might modify table-to-xml to accept an additional parameter: 您可以修改table-to-xml以接受其他参数:

(defn table-to-xml [name thread-count] ...)

This implies that you have n threads working on one table. 这意味着你有一个线程在一个表上工作。 In this case every thread might processes every nth row. 在这种情况下,每个线程可能会处理每第n行。 A problem with putting multiple threads on one table is that each thread is going to want to write to the same XML file. 将多个线程放在一个表上的问题是每个线程都想要写入同一个XML文件。 That bottleneck may make the strategy useless, but it's worth a shot. 这个瓶颈可能会让这个策略变得毫无用处,但值得一试。

If creating one XML file per table is acceptable then spawning one thread per table would likely be an easy win. 如果每个表创建一个XML文件是可以接受的,那么每个表生成一个线程可能很容易获胜。

(map #(future (table-to-xml %)) (table-names))

Using just a one-to-one relationship between tables, files and threads: as a guideline, I would expect your code to not contain any refs or dosyncs and the solution should be pretty straight forward. 在表,文件和线程之间只使用一对一的关系:作为指导,我希望你的代码不包含任何refs或dosyncs,解决方案应该非常简单。

Once you start spawning multiple threads per table you are adding complexity and may not see much of a performance increase. 一旦开始为每个表生成多个线程,就会增加复杂性,并且可能看不到性能的大幅提升。

In any case you would likely have one or two queries per table for getting values and meta-data. 在任何情况下,每个表可能会有一个或两个查询来获取值和元数据。 Regarding your comment about not wanting to load all the data in memory: Each thread would only be processing one row at a time. 关于不想在内存中加载所有数据的注释:每个线程一次只能处理一行。

Hope that helps! 希望有所帮助!

Given your comment here's some pseudo-ish code that might help: 鉴于您的评论,这里有一些可能有用的伪代码:

(defn write-to-xml [person]
  (dosync
   (with-out-append-writer *path*
     (print-person-as-xml))))

(defn resolve-relation [person table-name one-or-many]
  (let [result (query table-name (:id person))]
    (assoc person table-name (if (= :many one-or-many)
                               result
                               (first result)))))

(defn person-to-xml [person]
  (write-to-xml
   (-> person
       (resolve-relation "phones" :many)
       (resolve-relation "addresses" :many))))

(defn get-people []
  (map convert-to-map (query-db ...)))

(defn people-to-xml []
  (map (fn [person]
         (future (person-to-xml %)))
       (get-people)))

You might consider using the Java executors library to create a thread pool. 您可以考虑使用Java executors库来创建线程池。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM