简体繁体 English

存储过程VS. F＃

[英]Stored Procedure VS. F#

原文 2011-02-21 10:51:15 1 4 sql/ database/ stored-procedures/ f#

For most SP-taught developers, there are no option between Linq and Stored-Procedures/Functions. 对于大多数SP教学的开发人员，Linq和Stored-Procedures / Functions之间没有选项。 That's may be true. 这可能是真的。

However, there are a road junctions nowadays. 但是，现在有一个路口。 Before I spending too much time into syntax of F#, i would like more inputs about where the power (and opposite) of F# lies. 在我花费太多时间研究F＃的语法之前，我想更多关于F＃的功率（和相反）的位置。

How will F# perform on this topic (against SP)? F＃将如何针对此主题执行（针对SP）？

F# have to communicate with a database on some way. F＃必须以某种方式与数据库通信。 Through Linq2Sql/Entity-app-layer or directly though AnyDbConnection. 通过Linq2Sql / Entity-app-layer或直接通过AnyDbConnection。 Nothing new there. 那里什么新鲜事。 But F# have the power of parallellism and less overhead in thier work ( Functional Programming with C#/F# ). 但是F＃具有并行功能和较少的工作开销（使用C＃/ F＃进行功能编程）。 Also F# has it's effeciency as a layer for data and machine. 此外，F＃还具有作为数据和机器层的效率。 Pretty much like C# power of being a layer between human and machine. 非常像C＃作为人与机器之间的层次的力量。

Would I really still let the DB Server handle a request of recurring nodes, or just fetch plain data to F# and handle it there? 我是否真的仍然让数据库服务器处理重复节点的请求，或者只是将普通数据提取到F＃并在那里处理它？ Encapsulated nice and smoothly as a object method call from C#? 作为来自C＃的对象方法调用，封装得很好而且流畅吗？
Would a stored procedure still be the best option for scanning 50 millions of records for finding orphans or a criteria that matching 0,5% of the result? 存储过程是否仍然是扫描50百万条记录以查找孤儿或匹配0.5％结果的标准的最佳选择？
Would a SP or function still be best for a simple task as finding next parent node? SP或函数仍然是最好的简单任务，如寻找下一个父节点？
Would a SP still being best to collect a million records of data and return calculated sums and/or periods? SP是否仍然最好收集一百万条数据记录并返回计算的金额和/或期间？

Wouldn't a single f# dll library fully built on the Single responsibility principle being of more use then stored procedures hooked up inside a sql server? 完全建立在单一责任原则上的单个f＃dll库是否更有用，然后将存储过程连接到sql server中？ There are pros and cons, of course. 当然有利有弊。 But what are they? 但他们是什么？

4 个解决方案

Stored procedures are not magically super-fast. 存储过程并不神奇超快。 Often, they're actually rather slow. 通常，它们实际上相当慢。

Many people will downvote this answer providing anecdotal evidence that a stored procedure once made an application faster overall. 很多人都会对这个答案进行投票，提供轶事证据表明存储过程一旦使应用程序整体更快。 However, all of those examples that I've actually seen code for indicate that they totally rethought some bad SQL to package it as an SP. 但是，我实际看到的代码中的所有这些示例都表明他们完全重新考虑了一些错误的SQL来将其打包为SP。 I submit that the discipline of repackaging bad SQL into a procedure helped more than the SP itself. 我认为将错误的SQL重新打包到程序中的规则比SP本身更有帮助。

Most of your points can't be evaluated without a measured benchmark. 没有测量基准，大多数点都无法评估。

I suggest that you do the following. 我建议你做以下事情。

Write it in F#. 用F＃写。
Measure it. 测量它。
If it's too slow for your production application, then try some stored procedures to see if it's faster. 如果它对于您的生产应用程序来说太慢，那么尝试一些存储过程以查看它是否更快。 If it's fast enough for your production application, then you have your answer, F# worked for you. 如果它对您的生产应用程序足够快，那么您有答案，F＃适合您。 For your application. 适合您的应用。 For your data. 对于您的数据。 For your architecture. 对于您的架构。

There's no "general" answer. 没有“一般”的答案。 Although my benchmarks for some particular kinds of queries indicate that the SP engine is pretty slow compared with Java. 虽然我对某些特定类型的查询的基准测试表明，与Java相比，SP引擎相当慢。 F# will probably be faster than the SP engine also. F＃也可能比SP引擎更快。

The important thing is to make sure that the database -- if it's going to be "pure" data -- is already optimized so that queries like your "scanning 50 millions of records for finding orphans or a criteria that matching 0,5% of the result?" 重要的是要确保数据库 - 如果它将是“纯”数据 - 已经过优化，以便查询像“扫描50万条记录以查找孤儿或符合0.5％的标准”结果？” would retrieve the rows as quickly as possible. 将尽快检索行。 This often involves tweaking buffers and array sizes and other elements of the database-to-F# connection. 这通常涉及调整缓冲区和数组大小以及数据库到F＃连接的其他元素。 This usually means that you want a more direct connection so that you can adjust the sizes. 这通常意味着您需要更直接的连接，以便您可以调整大小。

Databases are efficient for certain tasks (eg when they can uses index to search for a specified row), but probably won't be any faster than F# if you need to process all rows and ubdate them (in database) or calculate some new result based on all the data. 数据库对于某些任务是有效的（例如，当他们可以使用索引来搜索指定的行时），但如果您需要处理所有行并使用它们（在数据库中）或计算一些新结果，则可能不会比F＃更快基于所有数据。

As S. Lott suggests, the best option is to try implementing what you need in F# and you'll find out. 正如S. Lott所说，最好的选择是尝试在F＃中实现你需要的东西，你会发现。 Parallelism can give you some performance benefits, especially if you're doing some computationally heavy calculations. 并行性可以为您带来一些性能优势，特别是如果您正在进行一些计算量很大的计算。 However, you may still want to store the data in databases, load it and process it in F# (I believe this is how F# was used by adCenter at Microsoft). 但是，您可能仍希望将数据存储在数据库中，加载并在F＃中处理它（我相信这就是微软adCenter使用F＃的方式）。

Possibly the most important note is that databases give you various guarantees about the consistency of the data - no matter what happens, you'll still end up with consistent state. 可能最重要的一点是数据库为您提供了有关数据一致性的各种保证 - 无论发生什么情况，您仍然会得到一致的状态。 Implementing this yourself may be tricky (eg when updating data), but you need to consider whether you need it or not. 自己实现这个可能很棘手（例如在更新数据时），但您需要考虑是否需要它。

You ask this: 你问这个：

Would a stored procedure still be the best option for scanning 50 millions of records for finding orphans or a criteria that matching 0,5% of the result? 存储过程是否仍然是扫描50百万条记录以查找孤儿或匹配0.5％结果的标准的最佳选择？

I take your question to mean 'I have this data in sql server. 我的问题是'我在sql server中有这个数据。 Should i query it in sql or in client code (F# in this case). 我应该在sql或客户端代码中查询它（在这种情况下是F＃）。 Queries like this should absolutely be performed in sql if at all possible. 如果可能的话，这样的查询绝对应该在sql中执行。 If you're doing it in F#, you're transferring those 50 million rows to the client just to do some aggregation/lookups. 如果您是在F＃中进行的，那么您将这5000万行传输到客户端只是为了进行一些聚合/查找。

I hope I understood your question correctly. 我希望我能正确理解你的问题。

As I understand an SP just means you call some precompiled execution plan, and you can call it through an API, instead of pushing a string to the server. 据我所知，SP只是意味着你调用一些预编译的执行计划，你可以通过API调用它，而不是将字符串推送到服务器。 These two save in the order of millseconds, nowhere near a second. 这两个保存在毫秒级，不到一秒钟。 For larger queries that difference is negligible. 对于较大的查询，差异可以忽略不计。 They're good for highfrequency/ throughput stuff (and of course encapsulating complex logic, but that doens't seem to apply here). 它们适用于高频率/吞吐量的东西（当然也包含复杂的逻辑，但这似乎并不适用于此）。

Because an SP uses a procompiled plan, it can indeed be slower than a normal query because it no longer checks the statitsics of the underlying data(becuase the execution plan is already compiled.) Since you mention a condition that applies to 0.5% of the rows, this could be important. 因为SP使用了一个预编译的计划，它确实比普通查询慢，因为它不再检查基础数据的统计数据（因为执行计划已经编译过了。）因为你提到的条件适用于0.5％的行，这可能很重要。

In the discussion of SP vs F# I would reword that to 'on the server' vs 'on the client'. 在讨论SP vs F＃时，我会将其重新命名为'on the server'vs'on the client'。 If you're talking higher data volumes (50M rows qualifies) my first choice would always be to 'put the mill where the wood is', that means execute on the server if possible. 如果你说的是更高的数据量（50M行符合条件），我的第一选择就是“将木材放在木头上”，这意味着如果可能的话，在服务器上执行。 Only if there is some very complicated logic involved you might want to consider F#, but I don't think that applies. 只有涉及到一些非常复杂的逻辑时，您才可能要考虑F＃，但我认为这不适用。 Then still I'd prefer to execute that on the server than first drag all those rows over the network (potentially slow). 然后，我宁愿在服务器上执行该操作，而不是先将所有这些行拖过网络（可能很慢）。

GJ GJ