简体   繁体   English

如何使用面向对象的Perl组装SQL?

[英]How can I assemble SQL with object-oriented Perl?

I'm currently in charge of a process that seems to be very intimate with the database. 我目前负责一个似乎与数据库非常亲密的过程。 My program/script/framework's goal is to make uniformity out of disparate data sources. 我的程序/脚本/框架的目标是从不同的数据源中实现统一。 Using a form of dependency injection, my process at a very high level works fine. 使用依赖注入的形式,我的进程在很高的水平上工作正常。 The implementation of each data source type is hidden from the highest level business abstraction of what's going on. 每个数据源类型的实现都隐藏在最高级别的业务抽象中。 Great. 大。 My questions are two. 我的问题是两个。

1) I have a long paragraph (and it's the length that's bothering me) that assembles an SQL statement in Perl-space of how to translate these different data sources into one, homogeneous end format. 1)我有一个很长的段落(这是困扰我的长度),它在Perl空间中组装了一个如何将这些不同数据源转换为一个同类结束格式的SQL语句。 So the SQL string always depends on the type of data I'm working with. 所以SQL字符串总是取决于我正在使用的数据类型。 The WHERE clause depends, the FROM clause depends, the INSERT clause depends, it all depends. WHERE子句依赖,FROM子句依赖,INSERT子句依赖,它全部依赖。 It's the high level of depending-ness that's confusing me. 这是高度依赖性让我感到困惑。 How do I model this process in an object-oriented way? 如何以面向对象的方式对此过程进行建模? MagicObject->buildSQL? MagicObject-> buildSQL? That's essentially what I have now, but it feels like all of the parts of the code know too much, hence it's length. 这基本上就是我现在所拥有的,但感觉代码的所有部分都知道太多,因此它的长度。

2) If I have a function that does something (builds SQL?), do I pass in the business objects whole and then stringify them at the last minute? 2)如果我有一个能做某事的函数(构建SQL?),我是否将整个业务对象传入,然后在最后一分钟对它们进行字符串化? Or do I stringify them early and only let my function handle what it needs, as opposed to rendering the objects itself? 或者我是否尽早将它们串联起来,只让我的函数处理它需要的东西,而不是渲染对象本身?

Edit : While I don't doubt the importance of ORMs, I do not believe we are yet in the ORM space. 编辑 :虽然我不怀疑ORM的重要性,但我不认为我们还处于ORM领域。 Imagine baseball data for the American, National, and Fictional leagues were all stored in wildly different formats with varying levels of normalization. 想象一下,美国,国家和虚构联盟的棒球数据都以不同的格式存储,具有不同的标准化水平。 It is the job of my process to read these data sources and put them in one unified, normalized pool. 读取这些数据源并将它们放在一个统一的规范化池中是我的过程的工作。 I feel the ORM space of acting on these objects happens after my process. 我觉得在我的过程之后发生了对这些物体采取行动的ORM空间。 I'm a sort of data janitor, if you will. 如果你愿意,我是一名数据看门人。 There are essentially no business objects yet to act on because of the lack of a unified pool, which I create. 由于缺少我创建的统一池,因此基本上没有业务对象可以采取行动。

Edit^2 : It's been brought to my attention that maybe I haven't described the problem space in enough detail. 编辑^ 2 :引起我的注意,也许我没有详细描述问题空间。 Here's an example. 这是一个例子。

Imagine you had to make a master database of all the criminals in the United States. 想象一下,你必须建立一个美国所有罪犯的主数据库。 Your company's service is selling a product which sits atop and provides access to this data in a clean, unified format. 贵公司的服务是销售一种产品,该产品位于顶部并以干净,统一的格式提供对这些数据的访问。

This data is provided publicly by the 50 states, but in wildly different formats. 这些数据由50个州公开提供,但格式完全不同。 Some are one file of data, not normalized. 有些是一个数据文件,没有标准化。 Other are normalized tables in CSV format. 其他是CSV格式的规范化表格。 Some are Excel documents. 有些是Excel文档。 Some are TSVs. 有些是TSV。 Some records are even provided that are not complete without manual intervention (other, manually created data sources). 甚至提供了一些在没有人工干预的情况下不完整的记录(其他,手动创建的数据源)。

The purpose of my project is to make a "driver" for each of the 50 states and make sure the end product of the process is a master database of criminals in a perfect, relation model. 我的项目的目的是为50个州中的每个州制定一个“驱动程序”,并确保该过程的最终产品是一个完美的关系模型中的犯罪分子的主数据库。 Everything keyed correctly, the schema in perfect shape, etc. 一切都正确键入,架构完美,等等。

You want to look at Fey . 你想看看Fey I started using it a few months ago on the job, and while the implementation still has rough corners due to young age, the idea behind it is solid. 几个月前我开始在工作中使用它,虽然由于年龄的原因,实施仍然有严峻的角落,但它背后的想法是可靠的。 F.ex., take a query lightly adapted from the manual: F.ex.,从手册中略微改编一下查询:

my $user = $schema->table( 'user' );
my $q = Fey::SQL
    ->new_select
    ->select( $user->columns( 'user_id', 'username' ) )
    ->from( $user );

Now you could write a function like this: 现在你可以写一个这样的函数:

sub restrict_with_group {
    my ( $q, $table, @group_id ) = @_;
    my $group = $schema->table( 'group' )->alias;
    $q
        ->from( $table, $group )
        ->where( $group->column( 'group_id' ), 'IN', @group_id );
}

This will add an inner join from user to group as well as a WHERE condition. 这将添加从usergroup的内部联接以及WHERE条件。 And voila, you can write the following in the main program: 瞧,您可以在主程序中编写以下内容:

restrict_with_group( $q, $user, qw( 1 2 3 ) );

But this restrict_with_group function will work for any query that that has a foreign key to the group table! 但是这个restrict_with_group函数适用于具有group表外键的任何查询! To use it, you pass the query you want to restrict and the table to which you want to apply the restriction, as well as the group IDs to which you want to restrict it. 要使用它,请传递要限制的查询以及要应用限制的表,以及要将其限制到的组ID。

In the end you say $q->sql( $dbh ) and you get back a string of SQL representing the query that you have built up in the $q object. 最后你会说$q->sql( $dbh ) ,你会得到一个SQL字符串,代表你在$q对象中建立的查询。

So basically Fey gives you the abstractive capabilities that native SQL is missing. 因此,Fey基本上为您提供了原生SQL缺失的抽象功能。 You can extract reusable aspects from your queries and package them as separate functions. 您可以从查询中提取可重用的方面,并将它们打包为单独的函数。

Please do not write your own ORM. 请不要编写自己的ORM。 Use something like DBIx::Class . 使用类似DBIx :: Class的东西。

All of these problems that you mention have been solved, and the implementation tested in thousands of other applications. 您提到的所有这些问题都已解决,并且已在数千个其他应用程序中测试了该实现。 Stick to writing your app, not reimplementing libraries. 坚持编写你的应用程序,而不是重新实现库。 You might not actually use DBIC in your app, but you should look at its implementation approach; 您可能实际上并未在应用程序中使用 DBIC,但您应该查看其实现方法; especially how it incrementally builds ResultSets (which aren't sets of results, but are rather deferred queries). 特别是它如何逐步构建ResultSet(不是结果集,而是延迟查询)。

If you don't want an ORM, but you want to assemble SQL from bits without direct string manipulation/concatenation, take a look at Fey , which may do what you want. 如果你想要一个ORM,但你想从没有直接字符串操作/连接的位组装SQL,请看看Fey ,它可能会做你想要的。

Update: Aristotle Pagaltzis's answer is much better. 更新:亚里士多德Pagaltzis的答案要好得多。 He actually gave examples of what Fey looks like and how it can help. 他实际上给出了Fey看起来像以及它如何帮助的例子。

From purely coding point of view - you have a long and complex piece of code on your hands. 从纯粹的编码角度来看 - 您手上有一长串复杂的代码。 You don't like it. 你不喜欢它。 Why? 为什么? I can only assume that there is some code duplication in there. 我只能假设那里有一些代码重复。 Otherwise, what's not to like? 否则,有什么不喜欢的? So, refactor it to eliminate duplication... I know it sounds trite, but since you don't post the code, it's hard to be more specific. 因此,重构它以消除重复...我知道它听起来很陈旧,但由于你没有发布代码,因此很难更具体。 May be have an object that has methods for from, where and insert clauses, so that the SQL's infrastructure is not duplicated? 可能有一个对象具有from,where和insert子句的方法,以便SQL的基础结构不重复? I just don't know what to do, exactly, but eliminating the duplication is key. 我确实不知道该怎么做,但消除重复是关键。

Unless I'm misunderstanding, this seems like an ETL (Extract/Transform/Load) application that hasn't figured out to keep the three stages separate. 除非我误解,否则这似乎是一个ETL(提取/转换/加载)应用程序,它没有想出要将这三个阶段分开。

If the output model is only a table or two then you're probably just as well off using SQL. 如果输出模型只是一两个表,那么你可能也可以使用SQL。 Otherwise, and especially if there are relationships between the tables you're inserting to, a decent ORM should simplify things. 否则,特别是如果你要插入的表之间存在关系,那么一个体面的ORM应该简化一些事情。

Taking the 50-state idea, you can't really get away from having 50 "extract" processes, hopefully with a library of shared routines. 采用50个国家的想法,你无法真正摆脱50个“提取”过程,希望有一个共享例程库。 I'd attack the problem one input source at a time, refactoring as I added new ones but being careful to encapsulate the variable parts so that I know exactly where changes will need to be made when a supplier changes their format. 我一次攻击一个输入源的问题,重构,因为我添加了新的,但小心地封装了可变部分,以便我知道当供应商更改其格式时需要做出哪些更改。

The "transform" part shouldn't be too onerous: just take what you got and prepare it for output. “变换”部分不应过于繁琐:只需拿走你得到的东西并准备输出即可。

I think you're describing dynamic SQL- building the request programmatically at runtime. 我想你在运行时以编程方式描述动态SQL构建请求。 This is a common feature of Object Relational Mappers such as LINQ to SQL and LLBLGenPro, to name a few. 这是对象关系映射器的常见功能,例如LINQ to SQL和LLBLGenPro,仅举几例。 Building one is no small task. 建立一个不是一项小任务。

Generally, ORMs objectify the SQL language. 通常,ORM将SQL语言客观化。 You write a sort of "SQL Document Object Model (DOM)" that allows you to build SQL queries programmatically by representing them (for example) as a "Request" object. 您编写了一种“SQL文档对象模型(DOM)”,它允许您通过将它们(例如)表示为“请求”对象来以编程方式构建SQL查询。 You then set properties on the Request object such as a Column collection, Table collection, and Join collection (these are just examples of one approach.) The result would be a SQL request string, exposed as a property of the Request object. 然后,您可以在Request对象上设置属性,例如Column集合,Table集合和Join集合(这些只是一种方法的示例。)结果将是一个SQL请求字符串,公开为Request对象的属性。

You must also make it possible for the Request object to read the schema definition of your data sources. 您还必须使Request对象能够读取数据源的模式定义。 You mention that your WHERE clause is type-dependent. 您提到您的WHERE子句是类型相关的。 Your SQL assembler must therefore be able to read the schema and build the clause appropriately. 因此,您的SQL汇编程序必须能够读取架构并适当地构建子句。

This may be overkill for your case. 对你的案子来说,这可能有点过头了。 I think the fundamental question is, do you absolutely require dynamic SQL queries, or is there a less complex option that will satisfy your requirements? 我认为根本问题是,您是否绝对需要动态SQL查询,或者是否有一个不太复杂的选项来满足您的要求?

It seems to me like the approach you are taking to solve this problem may need to be looked at. 在我看来,你可能需要考虑解决这个问题的方法。 You currently have multiple data sources which you need to treat as if it were a single data source. 您当前有多个数据源,您需要将其视为单个数据源。 So why keep them as separate data sources? 那么为什么要将它们作为单独的数据源?

Depending on how frequently the data is updated (or looking at performance, how often it is accessed) you could possibly do the combination of the data into a temporary data source such as SQLite. 根据数据更新的频率(或查看性能,访问频率),您可以将数据组合到临时数据源(如SQLite)中。 If your data from each state has a translator that will take it from format A to your common format in an SQLite table, then you can use your choice of methods to access it. 如果来自每个州的数据都有一个转换程序,它将从格式A转换为SQLite表中的通用格式,那么您可以使用您选择的方法来访问它。

This method also allows for flexibility as your data access needs may change. 此方法还允许灵活性,因为您的数据访问需求可能会发生变化。 For example, if you are asked the question, "How many blond drivers had speeding tickets in each state?". 例如,如果您被问到这样一个问题:“每个州都有多少金发司机有超速罚单?”。 The SQLite database can do this with a single command, while other solutions would likely require returning a set of data which then needs to be parsed, grouped, and set for output. SQLite数据库可以使用单个命令执行此操作,而其他解决方案可能需要返回一组数据,然后需要对其进行解析,分组和设置以进行输出。

If you don't want to deal with an ORM, I often have code like this: 如果你不想处理ORM,我经常有这样的代码:

my (@columns,@tables,@wheres,@order_bys,@values);

... # Add value to those variables as needed, using push.
... # use ? for variables to be quoted

# Build SQL statement
my $sql = "select ".join(",",@columns).
    " from ".join(",",@tables).
    " where ".join(" and ",@wheres).
    " order by ".join(",",@order_bys);

my $sth = $dbh->prepare($sql);
$sth->execute(@values);

Simple, no need for an ORM, very customisable. 简单,不需要ORM,非常可定制。 Plus, I always find ORM too heavy for the volume of data I'm dealing with, but that's another subject. 另外,我总是发现ORM对于我正在处理的数据量太重了,但这是另一个主题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM