简体繁体 English

关系数据库如何在引擎盖下工作？

[英]How do Relational Databases Work Under the Hood?

原文 2010-05-21 23:01:09 6 4 sql/ database/ parsing

I've always been interested in how you can throw some SQL at database, and it nearly instantaneously returns your results in an orderly manner without thinking about it as anything other than a black box. 我一直对你如何在数据库中抛出一些SQL感兴趣，它几乎立即以有序的方式返回你的结果，而不用把它当作黑盒子之外的任何东西。

What is really going on? 真的发生了什么？

I'm pretty sure it has something to do with how values are laid out regularly in memory, similar to an array; 我很确定它与如何在内存中定期布局值有关，类似于数组; but aside from that, I don't know much else. 但除此之外，我不知道其他什么。

How is SQL parsed in a manner to facilitate all of this? 如何以一种方式解析SQL以促进所有这些？

4 个解决方案

The engine builds a such called query plan. 引擎构建了一个这样的调用查询计划。

It's a set of algorithms used to return the sets that you described logically with an SQL query. 它是一组算法，用于返回您使用SQL查询逻辑描述的集合。

Almost each engine lets you see what query plan will it build for a certain query. 几乎每个引擎都可以让您查看为特定查询构建的查询计划。

In MySQL and PostgreSQL , you prepend your query with the word EXPLAIN 在MySQL和PostgreSQL ，您使用单词EXPLAIN前置查询
In SQL Server , you run SET SHOWPLAN_TEXT ON before running the query or just press Ctrl-L in the Management Studio 在SQL Server ，在运行查询之前运行SET SHOWPLAN_TEXT ON ，或者只需在Management Studio中按Ctrl-L
In Oracle , you prepend the query with EXPLAIN PLAN FOR and then issue SELECT * FROM (dbms_xplan.display) 在Oracle ，您使用EXPLAIN PLAN FOR预先添加查询，然后发出SELECT * FROM (dbms_xplan.display)

You may find interesting this article in my blog: 您可以在我的博客中找到有趣的这篇文章：

Double-thinking in SQL 在SQL中双重思考

which addresses the same question. 它解决了同样的问题。

In a basic sense, for many RDBMS: 从基本的意义上讲，对于许多RDBMS：

a) The syntax analysis stage takes input from the server setup (sockets, whatever) and turns this SQL into a valid AST or another intermediate form. a）语法分析阶段从服务器设置（套接字，无论如何）获取输入，并将此SQL转换为有效的AST或其他中间形式。
b) It then passes this information to a storage engine which turns this query description into a set of lookups on indexes, tables, partitions, replicated data and other elements that make up the semantics of storing the schema b）然后将此信息传递给存储引擎，该引擎将此查询描述转换为索引，表，分区，复制数据和构成存储模式语义的其他元素的一组查找
c) The engine then returns a set of data which is then provided to the client in whatever form (XML, CSV, Client specific). c）然后引擎返回一组数据，然后以任何形式（XML，CSV，客户特定）提供给客户端。

But there isn't one true answer . 但没有一个真正的答案 。 You will find similarities in indexing algorithms, distribution algorithms, caching, locking and other things ... but the main similarities is the language interface of the SQL language itself. 您将在索引算法，分发算法，缓存，锁定等方面找到相似之处......但主要的相似之处是SQL语言本身的语言接口 。 Beyond there, they can be implemented in any way they wish ... providing their results meet the expected semantics of the input query. 除此之外，它们可以以他们希望的任何方式实现...提供他们的结果满足输入查询的预期语义。

Really RDBMs contain all kinds of structures from computer science ... and each has highly developed and specialized methods for turning the implied semantics of SQL into concrete storage. 实际上，RDBM包含来自计算机科学的各种结构......并且每个结构都具有高度开发和专门的方法，用于将SQL的隐含语义转换为具体存储。

Think of how different MySQL and Oracle are ... or PostgreSQL and Microsoft SQL. 想想MySQL和Oracle有多么不同......或者PostgreSQL和Microsoft SQL。 They all attempt to meet some kind of common SQL-like specification ... but how that specification is fulfilled is diverse. 他们都试图满足某种类似SQL的通用规范......但是如何实现该规范是多种多样的。

Engines incorporate all kinds of exotica, specialist indexes to find datas physical location, caching systems and more. 引擎包含各种exotica，专业索引，以查找数据物理位置，缓存系统等。

There are tons of open source databases such as MySQL , PostgreSQL and search systems such as Sphinx you can have a look at their implementation. 有很多开源数据库，如MySQL ， PostgreSQL和搜索系统，如Sphinx，你可以看看它们的实现。 Open source is for learning as much as anything! 开源是为了学习任何东西！ Try and find a 'mentor' to guide you through the source. 尝试并找到一个“导师”来指导您完成源代码。

I'm pretty sure it has something to do with how values are laid out regularly in memory, similar to an array; 我很确定它与如何在内存中定期布局值有关，类似于数组; but aside from that, I don't know much else. 但除此之外，我不知道其他什么。

You might also want to look up articles on B+ Trees . 您可能还想查看有关B + Trees的文章。 That's the data structure main relational databases use. 这是主要关系数据库使用的数据结构。

You can read these books: 你可以阅读这些书：

[1] H. Garcia-Molina, Database System Implementation, Prentice Hall, 2000 [1] H. Garcia-Molina，数据库系统实施，Prentice Hall，2000年

[2] R. Elmasri, SB Navathe, Fundamentals of Database Systems, The Benjamin/Cummings Publ. [2] R. Elmasri，SB Navathe，数据库系统基础，Benjamin / Cummings Publ。 Comp., Inc, 1994 Comp。，Inc，1994