简体   繁体   English

如何避免 SQL 中的嵌套子查询

[英]how to avoid nested subqueries in SQL

I've just added a tagging system to my website and I'm trying to figure out the most efficient way to run scalable queries.我刚刚在我的网站上添加了一个标记系统,我正在尝试找出运行可扩展查询的最有效方法。 Here's a basic working mysql query to return tag matches for a given user:这是一个基本的工作 mysql 查询,用于返回给定用户的标签匹配:

SELECT
   scans.scan_index,
   scans.scan_id,
   scans.archive_folder 
FROM
   tags 
   INNER JOIN
      interpretationtags USING (tagid) 
   INNER JOIN
      interpretations USING (interpretation_id) 
   INNER JOIN
      scans 
      ON scans.scan_id = interpretations.scan_id 
      AND scans.archive_folder = interpretations.archive_folder 
   INNER JOIN
      archives 
      ON scans.archive_folder = archives.archive_folder 
WHERE
   archives.user_id = "google-authd...." 
   AND tags.tag = "tag1"

But it gets sticky when I want to query multiple tags for the same scan .但是当我想为同一个scan查询多个tags时,它会变得很棘手。 You see, tags are present in different interpretations, and there are multiple interpretations for each scan.你看, tags有不同的interpretations,每次scan. Here's a working query for two tags using a subquery:这是使用子查询对两个tags的有效查询:

SELECT
   a.scan_index,
   a.scan_id,
   a.archive_folder 
FROM
   (
      SELECT
         scans.scan_index,
         scans.scan_id,
         scans.archive_folder 
      FROM
         tags 
         INNER JOIN
            interpretationtags USING (tagid) 
         INNER JOIN
            interpretations USING (interpretation_id) 
         INNER JOIN
            scans 
            ON scans.scan_id = interpretations.scan_id 
            AND scans.archive_folder = interpretations.archive_folder 
         INNER JOIN
            archives 
            ON scans.archive_folder = archives.archive_folder 
      WHERE
         archives.user_id = "google-auth2..." 
         AND tags.tag = "tag1"
   )
   as a 
   INNER JOIN
      interpretations 
      ON a.scan_id = interpretations.scan_id 
      AND a.archive_folder = interpretations.archive_folder 
   INNER JOIN
      interpretationtags USING(interpretation_id) 
   INNER JOIN
      tags USING(tagid) 
WHERE
   tags.tag = "tag2"

Since this is running on a LAMP stack, I've written some PHP code to iterate over the tags I'd like to include in this AND-style search, building a multi-nested query.由于这是在 LAMP 堆栈上运行的,因此我编写了一些 PHP 代码来迭代我想包含在此 AND 样式搜索中的tags ,构建一个多嵌套查询。 Here's one with three这是一个与三个

SELECT
   b.scan_index,
   b.scan_id,
   b.archive_folder 
FROM
   (
      SELECT
         a.scan_index,
         a.scan_id,
         a.archive_folder 
      FROM
         (
            SELECT
               scans.scan_index,
               scans.scan_id,
               scans.archive_folder 
            FROM
               tags 
               INNER JOIN
                  interpretationtags USING (tagid) 
               INNER JOIN
                  interpretations USING (interpretation_id) 
               INNER JOIN
                  scans 
                  ON scans.scan_id = interpretations.scan_id 
                  AND scans.archive_folder = interpretations.archive_folder 
               INNER JOIN
                  archives 
                  ON scans.archive_folder = archives.archive_folder 
            WHERE
               archives.user_id = "google..." 
               AND tags.tag = "tag1"
         )
         as a 
         INNER JOIN
            interpretations 
            ON a.scan_id = interpretations.scan_id 
            AND a.archive_folder = interpretations.archive_folder 
         INNER JOIN
            interpretationtags USING(interpretation_id) 
         INNER JOIN
            tags USING(tagid) 
      WHERE
         tags.tag = "tag2"
   )
   as b 
   INNER JOIN
      interpretations 
      ON b.scan_id = interpretations.scan_id 
      AND b.archive_folder = interpretations.archive_folder 
   INNER JOIN
      interpretationtags USING(interpretation_id) 
   INNER JOIN
      tags USING(tagid) 
WHERE
   tags.tag = "tag3"

Even 4 nested subqueries runs fast with minimal data, but I just don't see this being a scalable solution when I'm dealing with 100k rows of data.即使是 4 个嵌套子查询也能以最少的数据快速运行,但当我处理 100k 行数据时,我只是不认为这是一个可扩展的解决方案。 How can I accomplish this without reverting to this ugly inefficient code?我怎样才能在不恢复到这个丑陋的低效代码的情况下做到这一点?

It's hard to be certain without table structures and sample data, but I think you're going about this in the wrong direction.没有表结构和示例数据很难确定,但我认为你的方向是错误的。 You should start from scans and find all the appropriate tags, and then filter on those (which should then be a simple IN expression):您应该从扫描开始并找到所有适当的标签,然后过滤这些标签(这应该是一个简单的IN表达式):

SELECT
   scans.scan_index,
   scans.scan_id,
   scans.archive_folder 
FROM
   scans
   INNER JOIN
      archives 
      ON scans.archive_folder = archives.archive_folder 
   INNER JOIN
      interpretations 
      ON scans.scan_id = interpretations.scan_id 
      AND scans.archive_folder = interpretations.archive_folder 
   INNER JOIN
      interpretationtags USING (interpretation_id) 
   INNER JOIN
      tags USING (tagid) 
WHERE
   archives.user_id = "google-authd...." 
   AND tags.tag IN("tag1", "tag2")

Note that based on your SELECT field list I don't think you actually need to JOIN to archives at all.请注意,根据您的SELECT字段列表,我认为您实际上根本不需要JOIN archives

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM