简体繁体 English

用于数据库抽象的惯用 haskell

[英]Idiomatic haskell for database abstraction

原文 2011-04-27 08:58:51 0 2 database/ haskell

In OOP languages I might write a database wrapper which encapsulates database connection, manages schema and provides few core operations, such as exec , query , prepare_and_execute .在 OOP 语言中，我可能会编写一个数据库包装器，它封装数据库连接、管理模式并提供一些核心操作，例如exec 、 query 、 prepare_and_execute 。 I might even have a separate database helper class which would handle the database schema, leaving the database abstraction only to handle connections.我什至可能有一个单独的数据库助手 class 来处理数据库模式，而数据库抽象仅用于处理连接。 This would then be used by model wrappers/factories which use the database abstraction class to create instances of model classes.这将被 model 包装器/工厂使用，它们使用数据库抽象 class 创建 model 类的实例。 Something along the line like this UML diagram:类似这样的 UML 图：

What would be the preferred way to design such a system in idiomatic haskell?在惯用的 haskell 中设计这样一个系统的首选方法是什么？

2 个解决方案

The most used database abstraction library in Haskell is HDBC . Haskell 中使用最多的数据库抽象库是HDBC 。 It means that queries are simply represented as String s with placeholders.这意味着查询被简单地表示为带有占位符的String 。 Fewer people use HaskellDB which provides a type-safe way to build queries.很少有人使用HaskellDB ，它提供了一种类型安全的方式来构建查询。 Nothing forbids to have user data types to represent common queries and custom functions to build them.没有什么禁止使用用户数据类型来表示常见查询和自定义函数来构建它们。

Values in Haskell are immutable, that means that it is not useful to have a mutable object corresponding to a record in the database. Haskell 中的值是不可变的，这意味着拥有与数据库中的记录相对应的可变 object 是没有用的。 Instead, I think it is more common to define user data types and functions that marshall and push/pull values of these types to/from the database.相反，我认为更常见的是定义用户数据类型和函数，将这些类型的值编组和推入/拉出数据库。

Whenever database updates are necessary, they are likely to be run in some stateful monad under IO .每当需要更新数据库时，它们很可能在IO下的一些有状态 monad 中运行。 This would allow to keep the connection open, for example, or do something between the requests.例如，这将允许保持连接打开，或者在请求之间做一些事情。

Finally, functions are first class, so it is possible to construct all functions on the fly.最后，函数首先是 class，因此可以动态构建所有函数。 So a function itself may encapsulate whatever information you want.因此，function 本身可以封装您想要的任何信息。

So, I think, the usual Haskell approach consists of所以，我认为，通常的 Haskell 方法包括

algebraic data types to represent actual data (as immutable values)表示实际数据的代数数据类型（作为不可变值）
the rest of the application to transform these values应用程序的 rest 来转换这些值
functions which generate queries (encapsulate schema details, marshal data to/from Haskell data types)生成查询的函数（封装模式详细信息，编组数据到/从 Haskell 数据类型）
(optionally) a stateful monad to run queries (hide details of database access) （可选）运行查询的有状态 monad（隐藏数据库访问的详细信息）
functions which run the queries (hide details of database access)运行查询的函数（隐藏数据库访问的详细信息）

The most idiomatic way of using Haskell for databases, and the most efficient one, IMHO, is to cache the records in memory and use STM in memory transactions, so that you use the database for storage.将 Haskell 用于数据库的最惯用的方式，也是最有效的一种方式，恕我直言，将记录缓存在 memory 中并在 memory 中使用 STM 事务，以便您使用数据库进行存储。 Then, you can use transactional variables (TVar´s) for your record management.然后，您可以使用事务变量 (TVar´s) 进行记录管理。 But you must define your own query language and you need a mechanism for caching/uncaching and synchronization.但是您必须定义自己的查询语言，并且需要一种用于缓存/取消缓存和同步的机制。 That is after all what java EJB3 and Hybernate does.这毕竟是 java EJB3 和 Hybernate 所做的。

The package TCache define DBRefs, that are persistent STM variables with TVar semantics . package TCache 定义 DBRefs，它们是具有 TVar 语义的持久 STM 变量。 They may be part of a record and point to another record and are lightweight, so you can develop your own abstraction over it.它们可能是记录的一部分并指向另一个记录并且是轻量级的，因此您可以在其上开发自己的抽象。 It also has a SQL like query language, including field search, joins and full text search.它还具有类似 SQL 的查询语言，包括字段搜索、连接和全文搜索。 It has default persistence in files.它在文件中具有默认持久性。 You only need to define a key for your Haskell record and you have file persistence.您只需要为您的 Haskell 记录定义一个键，并且您具有文件持久性。 For database persistence there is a IResource class where you define the read, write and delete operations for your records.对于数据库持久性，有一个 IResource class，您可以在其中定义记录的读取、写入和删除操作。 Each record may have its own persistence.每条记录都可能有自己的持久性。 So all the database interaction are in a single location of the source code, and transactions in memory are orders of magnitude faster.所以所有的数据库交互都在源代码的一个位置，memory 中的事务要快几个数量级。 TCache writes a coherent state each time that it asynchronously writes in the database. TCache 每次异步写入数据库时都会写入一致的 state。 It can write synchronously too.它也可以同步写入。