幫我把SUBQUERY變成JOIN

Question

兩張桌子。

電子郵件id（int10）| 所有權（int10）

消息emailid（int10）已索引| 消息（中等文本）

子查詢（在mysql中很糟糕）。

SELECT COUNT（*）FROM messages WHERE message LIKE'％word％'AND emailid IN（SELECT id FROM emails WHERE ownership = 32）

這里的用法是我在電子郵件上運行搜索（在上面的示例中顯然簡化了），它會生成一個包含3,000個電子郵件ID的列表。 然后，我想對郵件進行搜索，因為我需要進行文本匹配 - 僅針對該郵件的3000封電子郵件。

對消息的查詢是昂貴的（消息沒有索引），但這很好，因為它只會檢查幾行。

思路：

i）加入。 到目前為止，我對此的嘗試都沒有奏效，導致對消息表進行全表掃描（即未使用的emailid索引）ii）臨時表。 我認為這可行。 iii）緩存客戶端中的id並運行2個查詢。 這確實有效。 不優雅。 iv）子查詢。 mySQL子查詢每次都運行第二個查詢，所以這不起作用。 也許修復在mysql 6中。

好的，這是我到目前為止所擁有的。 這些是實際的字段名稱（我簡化了一些問題）。

查詢：

SELECT COUNT(*) FROM ticket LEFT JOIN ticket_subject 
ON (ticket_subject.ticketid = ticket.id) 
WHERE category IN (1) 
AND ticket_subject.subject LIKE "%about%"

結果：

1   SIMPLE  ticket  ref     PRIMARY,category    category    4   const   28874    
1   SIMPLE  ticket_subject  eq_ref  PRIMARY     PRIMARY     4   deskpro.ticket.id   1   Using where

它需要0.41秒並返回113的計數（*）。

運行：

SELECT COUNT (*) FROM ticket WHERE category IN (1)

需要0.01秒才能找到33,000個結果。

運行

SELECT COUNT (*) FROM ticket_subject WHERE subject LIKE "%about%"

需要0.14秒並找到1,300個結果。

票證表和ticket_subject表都有300,000行。

ticket_subject.ticketid和ticket.category上有一個索引。

我現在意識到使用LIKE語法是一個錯誤 - 因為它有點像FULLTEXT的紅色鯡魚。 這不是問題。 問題是：

1）表A - 非常快速的查詢，在索引上運行。 0.001秒2）表B - 中等到慢的查詢，沒有索引 - 進行全表掃描。 0.1秒

這兩個結果都很好。 問題是我必須加入它們，搜索需要0.3秒; 這對我來說沒有意義，因為表B上的組合查詢的緩慢方面應該更快，因為我們現在只搜索該表的一小部分 - 即它不應該進行全表掃描，因為正在加入的字段on已編入索引。

Answer 1

記得利用布爾短路評估：

SELECT COUNT(*) 
FROM messages 
join emails ON emails.id = messages.emailid
WHERE ownership = 32 AND message LIKE '%word%'

這會在評估LIKE謂詞之前按ownership進行過濾。 總是把更便宜的表達放在左邊。

另外，我同意@Martin Smith和@MJB你應該考慮使用MySQL的FULLTEXT索引來加快速度。

重新評論和其他信息，這里有一些分析：

explain SELECT COUNT(*) FROM ticket WHERE category IN (1)\G

           id: 1
  select_type: SIMPLE
        table: ticket
         type: ref
possible_keys: category
          key: category
      key_len: 4
          ref: const
         rows: 1
        Extra: Using index

注意“使用索引”是一件好事，因為它意味着它只需通過讀取索引數據結構就可以滿足查詢，甚至不會觸及表的數據。 這肯定會跑得很快。

explain SELECT COUNT(*) FROM ticket_subject WHERE subject LIKE '%about%'\G

           id: 1
  select_type: SIMPLE
        table: ticket_subject
         type: ALL
possible_keys: NULL        <---- no possible keys
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
        Extra: Using where

這表明沒有可能的密鑰可以使通配符LIKE謂詞受益。 它使用WHERE子句中的條件，但必須通過運行表掃描來評估它。

explain SELECT COUNT(*) FROM ticket LEFT JOIN ticket_subject 
ON (ticket_subject.ticketid = ticket.id) 
WHERE category IN (1) 
AND ticket_subject.subject LIKE '%about%'\G

           id: 1
  select_type: SIMPLE
        table: ticket
         type: ref
possible_keys: PRIMARY,category
          key: category
      key_len: 4
          ref: const
         rows: 1
        Extra: Using index

           id: 1
  select_type: SIMPLE
        table: ticket_subject
         type: ref
possible_keys: ticketid
          key: ticketid
      key_len: 4
          ref: test.ticket.id
         rows: 1
        Extra: Using where

同樣，訪問票證表很快，但是由LIKE條件引起的表掃描破壞了這一點。

ALTER TABLE ticket_subject ENGINE=MyISAM;

CREATE FULLTEXT INDEX ticket_subject_fulltext ON ticket_subject(subject);

explain SELECT COUNT(*) FROM ticket JOIN ticket_subject  
ON (ticket_subject.ticketid = ticket.id)  
WHERE category IN (1)  AND MATCH(ticket_subject.subject) AGAINST('about')

           id: 1
  select_type: SIMPLE
        table: ticket
         type: ref
possible_keys: PRIMARY,category
          key: category
      key_len: 4
          ref: const
         rows: 1
        Extra: Using index

           id: 1
  select_type: SIMPLE
        table: ticket_subject
         type: fulltext
possible_keys: ticketid,ticket_subject_fulltext
          key: ticket_subject_fulltext          <---- now it uses an index
      key_len: 0
          ref: 
         rows: 1
        Extra: Using where

你永遠不會讓LIKE表現得很好。 請參閱我的演示文稿MySQL中的實用全文搜索。

重新評論：好的，我已經對類似大小的數據集（Stack Overflow數據轉儲中的用戶和徽章表）進行了一些實驗:-)。 這是我發現的：

select count(*) from users
where reputation > 50000

+----------+
| count(*) |
+----------+
|       37 |
+----------+
1 row in set (0.00 sec)

這真的很快，因為我在聲譽列上有一個索引。

           id: 1
  select_type: SIMPLE
        table: users
         type: range
possible_keys: users_reputation_userid_displayname
          key: users_reputation_userid_displayname
      key_len: 4
          ref: NULL
         rows: 37
        Extra: Using where; Using index

select count(*) from badges
where badges.creationdate like '%06-24%'

+----------+
| count(*) |
+----------+
|     1319 |
+----------+
1 row in set, 1 warning (0.63 sec)

這是預期的，因為該表有700k行，並且它必須進行表掃描。 現在讓我們來加入：

select count(*) from users join badges using (userid)
where users.reputation > 50000 and badges.creationdate like '%06-24%'

+----------+
| count(*) |
+----------+
|       19 |
+----------+
1 row in set, 1 warning (0.03 sec)

這似乎並不那么糟糕。 這是解釋報告：

           id: 1
  select_type: SIMPLE
        table: users
         type: range
possible_keys: PRIMARY,users_reputation_userid_displayname
          key: users_reputation_userid_displayname
      key_len: 4
          ref: NULL
         rows: 37
        Extra: Using where; Using index

           id: 1
  select_type: SIMPLE
        table: badges
         type: ref
possible_keys: badges_userid
          key: badges_userid
      key_len: 8
          ref: testpattern.users.UserId
         rows: 1
        Extra: Using where

這看起來似乎是智能地為連接使用索引，它有助於我有一個復合索引，包括用戶ID和聲譽。 請記住，MySQL每個表只能使用一個索引，因此為您需要的查詢定義正確的復合索引非常重要。

重新評論：好的，我已經嘗試了這個名聲> 5000，信譽> 500，信譽> 50的地方。這些應該與更大的用戶組相匹配。

select count(*) from users join badges using (userid)
where users.reputation > 5000 and badges.creationdate like '%06-24%'

+----------+
| count(*) |
+----------+
|      194 |
+----------+
1 row in set, 1 warning (0.27 sec)

select count(*) from users join badges using (userid)
where users.reputation > 500 and badges.creationdate like '%06-24%'

+----------+
| count(*) |
+----------+
|      624 |
+----------+
1 row in set, 1 warning (0.93 sec)

select count(*) from users join badges using (userid)
where users.reputation > 50 and badges.creationdate like '%06-24%'
--------------

+----------+
| count(*) |
+----------+
|     1067 |
+----------+
1 row in set, 1 warning (1.72 sec)

解釋報告在所有情況下都是相同的，但如果查詢在Users表中找到更多匹配的行，那么它自然必須針對徽章表中的更多匹配行評估LIKE謂詞。

確實，加入會有一些成本。 這有點令人驚訝，它的價格非常昂貴。 但是如果使用索引，這可以減輕。

我知道你說你有一個不能使用索引的查詢，但也許是時候考慮創建一個冗余列，其中包含原始列數據的某些轉換版本，因此您可以對其進行索引。 在上面的示例中，我可能會創建一個列creationdate_day並從DAYOFYEAR(creationdate)填充它。

這就是我的意思：

ALTER TABLE Badges ADD COLUMN creationdate_day SMALLINT;
UPDATE Badges SET creationdate_day = DAYOFYEAR(creationdate);
CREATE INDEX badge_creationdate_day ON Badges(creationdate_day);

select count(*) from users join badges using (userid)
where users.reputation > 50 and badges.creationdate_day = dayofyear('2010-06-24')

+----------+
| count(*) |
+----------+
|     1067 |
+----------+
1 row in set, 1 warning (0.01 sec)  <---- not too shabby!

這是解釋報告：

          id: 1
  select_type: SIMPLE
        table: badges
         type: ref
possible_keys: badges_userid,badge_creationdate_day
          key: badge_creationdate_day    <---- here is our new index
      key_len: 3
          ref: const
         rows: 1318
        Extra: Using where

           id: 1
  select_type: SIMPLE
        table: users
         type: eq_ref
possible_keys: PRIMARY,users_reputation_userid_displayname
          key: PRIMARY
      key_len: 8
          ref: testpattern.badges.UserId
         rows: 1
        Extra: Using where

Answer 2

SELECT COUNT(*) 
FROM messages 
join emails ON emails.id = messages.emailid
WHERE message LIKE '%word%' 
AND ownership = 32

但問題是'%word%'這將始終需要掃描消息。 如果您使用的是MyISAM則可能需要查看全文搜索。

Answer 3

我認為這就是你要找的東西：

select count(*)
from messages m
  inner join emails e
    on e.id = m.emailid
where m.message like '%word%'
  and e.ownership = 32

很難確定它將如何表現。 如果FTS是因為WORD上的起始通配符，那么這樣做就不能解決問題。 但好消息是，加入可能會限制您必須查看的消息表中的記錄。

Answer 4

您是否有可能以相反的方式轉換聯接？ 似乎第二個查詢是一個較便宜的查詢，因為整個事情是一個簡單的連接，那么你想要執行較便宜的查詢來縮小數據集的范圍，然后連接到更昂貴的查詢。

幫我把SUBQUERY變成JOIN

問題描述

4 個解決方案

解決方案1
8 已采納 2010-06-23 12:32:26

解決方案2
3 2010-06-23 12:19:12

解決方案3
2 2010-06-23 12:19:55

解決方案4
0 2010-06-23 12:21:59

幫我把SUBQUERY變成JOIN

問題描述

4 個解決方案

解決方案1 8 已采納 2010-06-23 12:32:26

解決方案2 3 2010-06-23 12:19:12

解決方案3 2 2010-06-23 12:19:55

解決方案4 0 2010-06-23 12:21:59

解決方案1
8 已采納 2010-06-23 12:32:26

解決方案2
3 2010-06-23 12:19:12

解決方案3
2 2010-06-23 12:19:55

解決方案4
0 2010-06-23 12:21:59