简体   繁体   English

如何优化SQL自连接查询?

[英]How to optimize SQL self-join query?

I've got two tables (and these are part of a third-party application, so I can't change their schemas):我有两个表(这些是第三方应用程序的一部分,所以我无法更改它们的架构):

  • doc has doc_id, is_active, part_number, name, and description doc 有 doc_id、is_active、part_number、name 和 description
  • node has node_id, doc_id, and parent_node_id.节点有 node_id、doc_id 和 parent_node_id。

node.doc_id refers to a doc.doc_id value (there isn't a foreign key relationship) and node.parent_node_id refers to a node.node_id value, setting up a parent/child relationship of the values in the doc table. node.doc_id 指的是一个 doc.doc_id 值(没有外键关系),node.parent_node_id 指的是一个 node.node_id 值,建立 doc 表中值的父/子关系。 Each entry in the doc table can have zero or one parent, and any number of children. doc 表中的每个条目可以有零个或一个父项,以及任意数量的子项。

For a given part number, I need the name and description for all matching entries in the doc table AND (here's the tricky part) for each such matching entry I need to know whether that entry has any active children.对于给定的部件号,我需要 doc 表中所有匹配条目的名称和描述,并且(这是棘手的部分)对于每个这样的匹配条目,我需要知道该条目是否有任何活动的子条目。

Here's an example:下面是一个例子:

doc
doc_id   is_active   part_number   name   description
1        T           AAA           Fred   Little
2        T           AAA           George Middle
3        T           AAA           Sam    Morse
4        T           CCC           Mary   Moo
5        T           DDD           Carol  Smith
6        F           DDD           Midge  Moo

node
node_id    doc_id    parent_node_id
10          1           null
11          2           null
12          3           null
13          4           10
14          5           10
15          6           11

So you can see that doc_id 1 has 2 children (doc_ids 4 and 5) and doc_id 2 has one child (doc_id 6).所以你可以看到 doc_id 1 有 2 个孩子(doc_ids 4 和 5),doc_id 2 有一个孩子(doc_id 6)。

Graphically:图形化:

doc[doc_id=1] -> node[doc_id=1,node_id=10]; doc[doc_id=1] -> 节点[doc_id=1,node_id=10]; nodes with parent_node_id=10 are node[node_id=13,doc_id=4] and node[node_id=14,doc_id=5]; parent_node_id=10 的节点是 node[node_id=13,doc_id=4] 和 node[node_id=14,doc_id=5]; both doc[doc_id=4] and doc[doc_id=5] have is_active=T. doc[doc_id=4] 和 doc[doc_id=5] 都有 is_active=T。

doc[doc_id=2] -> node[doc_id=2,node_id=11]; doc[doc_id=2] -> 节点[doc_id=2,node_id=11]; the only node with parent_node_id=11 is node[node_id=15,doc_id=6] but doc[doc_id=6] has is_active=F. parent_node_id=11 的唯一节点是 node[node_id=15,doc_id=6] 但 doc[doc_id=6] 有 is_active=F。

If I make my request for part_number=AAA, I need to get back:如果我请求 part_number=AAA,我需要返回:

doc_id   name    description    has_active_children
1        Fred    Little         T
2        George  Middle         F
3        Sam     Morse          F

Right now I've got this query where I count the number of children (which is unnecessary but the only thing I could figure out):现在我有这个查询,我计算孩子的数量(这是不必要的,但我唯一能弄清楚的):

select d1.*,
   (select count(dn.node_id) from node dn
    inner join doc dc on dn.doc_id=dc.doc_id
    where dn.parent_node_id=
      (select dx.node_id from node dx where dx.doc_id=d1.doc_id)
    and dc.is_active='T') as childCount
from doc d1 where d1.part_number='AAA'

This works but isn't terribly fast.这有效,但不是非常快。 We're running on SQL Server, and I tried the "set showplan_all" but didn't understand the output well enough to make any changes.我们在 SQL Server 上运行,我尝试了“set showplan_all”,但对输出的理解不够好,无法进行任何更改。

Is there an obviously better way to do this query?有没有明显更好的方法来执行此查询? Or, is there a document that would help me understand the showplan output?或者,是否有文档可以帮助我理解 showplan 输出?

This should be a decent starting point.这应该是一个不错的起点。 I've had to do similar self-joins in a database that held a hierarchical representation of geographical regions.我不得不在一个包含地理区域分层表示的数据库中进行类似的自连接。

If you change either of the 2 left join statements into a simple join , any parent that doesn't have a child will be removed from the query results.如果您将 2 个left join语句中的任何一个更改为 simple join ,则任何没有子级的父级都将从查询结果中删除。

SELECT parent.[doc_id],
       parent.[name],
       parent.[description],
       parent.[part_number],
       CASE WHEN COUNT(child.[doc_id]) > 0 THEN 'T' ELSE 'F' END
FROM       doc   parent
JOIN       node  parentRef on parent.[doc_id] = parentRef.[doc_id]
LEFT JOIN  node  childRef on parentRef.[node_id] = childRef.[parent_node_id]
LEFT JOIN  doc   child on child.[doc_id] = childRef.[doc_id]
WHERE parent.[part_number] = 'AAA'
GROUP BY parent.[doc_id], parent.[name], parent.[part_number], parent.[description]

EDIT: Adding part_number to the query编辑:将 part_number 添加到查询中

Also, as with any SQL query, look into the Indexes that exist on these tables.此外,与任何 SQL 查询一样,查看这些表上存在的索引。 You may be able to add an Index or two to increase the query performance.您可以添加一两个索引来提高查询性能。

You can use Window Function which will be a Good fit for your case.您可以使用非常适合您的情况的窗口函数。

SELECT parent.[doc_id],
       parent.[name],
       parent.[description],
       parent.[part_number],
       CASE WHEN ROW_NUMBER(PARTITION BY child.[doc_id]) > 0 THEN 'T' ELSE 'F' END  AS childCount
FROM       doc   parent
JOIN       node  parentRef on parent.[doc_id] = parentRef.[doc_id]
LEFT JOIN  node  childRef on parentRef.[node_id] = childRef.[parent_node_id]
LEFT JOIN  doc   child on child.[doc_id] = childRef.[doc_id]
WHERE parent.[part_number] = 'AAA' AND child.is_active = 'T'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM