简体   繁体   English

MySQL在子查询中没有使用INDEX

[英]MySQL is not using INDEX in subquery

I have these tables and queries as defined in sqlfiddle . 我有sqlfiddle中定义的这些表和查询。

First my problem was to group people showing LEFT JOINed visits rows with the newest year. 首先,我的问题是将人们显示为最近一年的LEFT JOINed访问行。 That I solved using subquery. 我用子查询解决了。

Now my problem is that that subquery is not using INDEX defined on visits table. 现在我的问题是该子查询没有使用在visits表上定义的INDEX。 That is causing my query to run nearly indefinitely on tables with approx 15000 rows each. 这导致我的查询几乎无限期地运行在每个约15000行的表上。

Here's the query. 这是查询。 The goal is to list every person once with his newest (by year) record in visits table. 目标是用访问表中最新(按年)的记录列出每个人一次。

Unfortunately on large tables it gets real sloooow because it's not using INDEX in subquery. 不幸的是,在大型表格上它会变得真实,因为它不在子查询中使用INDEX。

SELECT *
FROM people
LEFT JOIN (
  SELECT *
  FROM visits
  ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id

Does anyone know how to force MySQL to use INDEX already defined on visits table? 有谁知道如何强制MySQL使用已在visits表上定义的INDEX?

Your query: 您的查询:

SELECT *
FROM people
LEFT JOIN (
  SELECT *
  FROM visits
  ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id;
  • First, is using non-standard SQL syntax (items appear in the SELECT list that are not part of the GROUP BY clause, are not aggregate functions and do not sepend on the grouping items). 首先,使用非标准SQL语法( SELECT列表中出现的项不属于GROUP BY子句,不是聚合函数,也不依赖于分组项)。 This can give indeterminate (semi-random) results. 这可以给出不确定(半随机)的结果。

  • Second, ( to avoid the indeterminate results) you have added an ORDER BY inside a subquery which (non-standard or not) is not documented anywhere in MySQL documentation that it should work as expected. 其次,(为了避免不确定的结果)你在子查询中添加了一个ORDER BY ,在MySQL文档中的任何地方都没有记录(非标准或非标准)它应该按预期工作。 So, it may be working now but it may not work in the not so distant future, when you upgrade to MySQL version X (where the optimizer will be clever enough to understand that ORDER BY inside a derived table is redundant and can be eliminated). 因此,它现在可能正在工作,但是当你升级到MySQL版本X时,它可能无法在不太遥远的未来工作(优化器将足够聪明地理解派生表中的ORDER BY是多余的并且可以被消除) 。

Try using this query: 尝试使用此查询:

SELECT 
    p.*, v.*
FROM 
    people AS p
  LEFT JOIN 
        ( SELECT 
              id_people
            , MAX(year) AS year
          FROM
              visits
          GROUP BY
              id_people
         ) AS vm
      JOIN
          visits AS v
        ON  v.id_people = vm.id_people
        AND v.year = vm.year 
    ON  v.id_people = p.id;

The: SQL-fiddle SQL-fiddle

A compound index on (id_people, year) would help efficiency. (id_people, year)复合指数有助于提高效率。


A different approach. 一种不同的方法。 It works fine if you limit the persons to a sensible limit (say 30) first and then join to the visits table: 如果您首先将人员限制在合理的限制(例如30),然后加入visits表,则它可以正常工作:

SELECT 
    p.*, v.*
FROM 
    ( SELECT *
      FROM people
      ORDER BY name
        LIMIT 30
    ) AS p
  LEFT JOIN 
    visits AS v
      ON  v.id_people = p.id
      AND v.year =
    ( SELECT 
          year
      FROM
          visits
      WHERE
          id_people = p.id
      ORDER BY
          year DESC
        LIMIT 1
     )  
ORDER BY name ;

Why do you have a subquery when all you need is a table name for joining? 当你需要的是一个用于加入的表名时,为什么你有一个子查询?

It is also not obvious to me why your query has a GROUP BY clause in it. 对我来说,为什么你的查询中有一个GROUP BY子句也是不明显的。 GROUP BY is ordinarily used with aggregate functions like MAX or COUNT , but you don't have those. GROUP BY通常与MAXCOUNT等聚合函数一起使用,但您没有这些函数。

How about this? 这个怎么样? It may solve your problem. 它可以解决您的问题。

    SELECT people.id, people.name, MAX(visits.year) year
      FROM people
      JOIN visits ON people.id = visits.id_people
  GROUP BY people.id, people.name

If you need to show the person, the most recent visit, and the note from the most recent visit, you're going to have to explicitly join the visits table again to the summary query (virtual table) like so. 如果您需要显示此人,最近一次访问以及最近一次访问中的注释,您将不得不再次将访问表明确地加入到摘要查询(虚拟表)中。

SELECT a.id, a.name, a.year, v.note
  FROM (
         SELECT people.id, people.name, MAX(visits.year) year
          FROM people
          JOIN visits ON people.id = visits.id_people
      GROUP BY people.id, people.name
  )a
  JOIN visits v ON (a.id = v.id_people and a.year = v.year)

Go fiddle: http://www.sqlfiddle.com/#!2/d67fc/20/0 小提琴: http ://www.sqlfiddle.com/#!2 / d67fc / 0/0

If you need to show something for people that have never had a visit, you should try switching the JOIN items in my statement with LEFT JOIN . 如果您需要为从未访问过的人显示某些内容,您应该尝试使用LEFT JOIN切换我的语句中的JOIN项。

As someone else wrote, an ORDER BY clause in a subquery is not standard, and generates unpredictable results. 正如其他人所写的那样,子查询中的ORDER BY子句不是标准的,并且会产生不可预测的结果。 In your case it baffled the optimizer. 在你的情况下,它使优化器感到困惑。

Edit : GROUP BY is a big hammer. 编辑GROUP BY是一个大锤子。 Don't use it unless you need it. 除非您需要,否则不要使用它。 And, don't use it unless you use an aggregate function in the query. 并且,除非在查询中使用聚合函数,否则不要使用它。

Notice that if you have more than one row in visits for a person and the most recent year, this query will generate multiple rows for that person, one for each visit in that year. 请注意,如果您在一个人和最近一年的访问中有多行,则此查询将为该人生成多行,每年一次访问一行。 If you want just one row per person, and you DON'T need the note for the visit, then the first query will do the trick. 如果你只需要每人一行,并且你不需要访问的注释,那么第一个查询就可以了。 If you have more than one visit for a person in a year, and you only need the latest one, you have to identify which row IS the latest one. 如果一年中有一个人访问过多次,而您只需要最新的访问权限,则必须确定哪一行是最新的一行。 Usually it will be the one with the highest ID number, but only you know that for sure. 通常它会是ID号最高的那个,但只有你肯定知道。 I added another person to your fiddle with that situation. 我在这种情况下向你的小提琴添加了另一个人。 http://www.sqlfiddle.com/#!2/4f644/2/0 http://www.sqlfiddle.com/#!2/4f644/2/0

This is complicated. 这是复杂的。 But: if your visits.id numbers are automatically assigned and they are always in time order, you can simply report the highest visit id, and be guaranteed that you'll have the latest year. 但是:如果您的visit.id号码是自动分配的,并且它们始终按时间顺序排列,您只需报告最高访问ID,并保证您将拥有最新的一年。 This will be a very efficient query. 这将是一个非常有效的查询。

SELECT p.id, p.name, v.year, v.note
  FROM (
         SELECT id_people, max(id) id
          FROM visits
      GROUP BY id_people
  )m
  JOIN people p ON (p.id = m.id_people)
  JOIN visits v ON (m.id = v.id)

http://www.sqlfiddle.com/#!2/4f644/1/0 But this is not the way your example is set up. http://www.sqlfiddle.com/#!2/4f644/1/0但这不是您的示例设置方式。 So you need another way to disambiguate your latest visit, so you just get one row per person. 所以你需要另一种方法来消除你最近一次访问的歧义,所以你每人只需要一行。 The only trick we have at our disposal is to use the largest id number. 我们可以使用的唯一技巧是使用最大的ID号。

So, we need to get a list of the visit.id numbers that are the latest ones, by this definition, from your tables. 因此,我们需要从您的表中获取一个visit.id数字列表,这些数字是此定义中的最新数字。 This query does that, with a MAX(year)...GROUP BY(id_people) nested inside a MAX(id)...GROUP BY(id_people) query. 此查询执行此操作,MAX(年)... GROUP BY(id_people)嵌套在MAX(id)... GROUP BY(id_people)查询中。

  SELECT v.id_people,
         MAX(v.id) id
    FROM (
         SELECT id_people, 
                MAX(year) year
           FROM visits
          GROUP BY id_people
         )p
    JOIN visits v ON (p.id_people = v.id_people AND p.year = v.year)
   GROUP BY v.id_people

The overall query (http://www.sqlfiddle.com/#!2/c2da2/1/0) is this. 整体查询(http://www.sqlfiddle.com/#!2/c2da2/1/0)就是这样。

SELECT p.id, p.name, v.year, v.note
  FROM (
      SELECT v.id_people,
             MAX(v.id) id
        FROM (
             SELECT id_people, 
                    MAX(year) year
               FROM visits
              GROUP BY id_people
             )p
        JOIN visits v ON (     p.id_people = v.id_people 
                           AND p.year = v.year)
       GROUP BY v.id_people
      )m
   JOIN people p ON (m.id_people = p.id)
   JOIN visits v ON (m.id = v.id)

Disambiguation in SQL is a tricky business to learn, because it takes some time to wrap your head around the idea that there's no inherent order to rows in a DBMS. SQL中的消歧是一项棘手的业务,因为需要一些时间来解决DBMS中没有行固有顺序的想法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM