简体   繁体   English

复杂SQL查询优化

[英]Complex SQL query optimization

I'm trying to optimize an SQL query.我正在尝试优化 SQL 查询。 Can you help me?你能帮助我吗?

Basically each user has friends through a friendship table and each user has many feed_events trough a user_feed_events table.基本上,每个用户通过友谊表有朋友,每个用户通过 user_feed_events 表有许多 feed_events。 I'm trying to list the feed_events of the friends of a given user.我正在尝试列出给定用户的朋友的 feed_events。 Shouldn't be impossible, right?应该不是不可能吧? :) :)

As you can see the performance of the query depends on how many friends a user has.如您所见,查询的性能取决于用户有多少朋友。 Right now a user with 150 friends takes almost 7 seconds to execute.现在,一个有 150 个朋友的用户需要将近 7 秒的时间来执行。

UPDATE: here is how my friendship table is built:更新:这是我的友谊表的构建方式:

create_table "friendships", :force => true do |t|
t.integer  "user_id",     :null => false
t.integer  "friend_id",   :null => false
t.datetime "created_at"
t.datetime "accepted_at"
end

add_index "friendships", ["friend_id"], :name => "index_friendships_on_friend_id"
add_index "friendships", ["user_id"], :name => "index_friendships_on_user_id"

First I ask rails to give me the list of the ids of the userids of the friends of the user, then I use this string on the real query.首先我要求 rails 给我用户朋友的用户 ID 的 ID 列表,然后我在实际查询中使用这个字符串。

friends_id = current_user.friends.collect {|f| f.id}.join(",")

sql = "
SELECT 
DISTINCT feed_events.id, 
feed_events.event_type, 
feed_events.type_id, 
feed_events.data, 
feed_events.created_at, 
feed_events.updated_at, 
user_feed_events.user_id  
FROM feed_events 
LEFT JOIN user_feed_events 
ON feed_events.id = user_feed_events.feed_event_id 
WHERE user_feed_events.user_id IN (#{friends_id}) 
ORDER BY feed_events.created_at DESC"

Then I acutally execute the query (paginating it and limiting to 30 results):然后我实际执行查询(对其进行分页并限制为 30 个结果):

@events = FeedEvent.paginate_by_sql(sql, :page => params[:page], :per_page => 30)

UPDATE #2: HERE IS THE EXPLAIN ANALYZE OUTPUT:更新#2:这里是解释分析 OUTPUT:

    SQL> EXPLAIN ANALYZE (SELECT  DISTINCT feed_events.id,  feed_events.event_type,  feed_events.type_id,  feed_events.data,  feed_events.created_at,  feed_events.updated_at,  user_feed_events.user_id   FROM user_feed_events  INNER JOIN feed_events  ON feed_events.id = user_feed_events.feed_event_id  WHERE user_feed_events.user_id IN (1,7,9,8,14,15,20,35,40,39,41,42,57,84,98,109,121,74,129,64,137,77,172,182,206,201,284,31,94,232,311,168,30,114,50,174,419,403,438,464,423,513,351,349,385,622,751,359,809,838,844,962,831,786,896,1001,992,998,990,256,67,623,957,1226,1060,1009,1490,132,1467,1672,619,1459,1466,993,1599,1365,607,1381,1714,1154,2032,2230,2240,2354,598,2345,1804,634,1900,2652,1975,2164,1759,3288,1004,3487,3507,3542,3566,514,3787,3137,3803,3090,4012,855,17,2026,1463,335,1000,935,5,12,10,13,19,18,16,22,34,27,29,59,126,90,46,23,63,291,134,229,107,439,521)  ORDER BY feed_events.created_at DESC)
    +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    ||

    | Unique  (cost=6090.87..6162.93 rows=18014 width=389) (actual time=1641.210..1733.010 rows=29691 loops=1)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
    |   ->  Sort  (cost=6090.87..6099.88 rows=18014 width=389) (actual time=1641.206..1670.882 rows=29694 loops|
    |         Sort Key: feed_events.created_at, feed_events.id, feed_events.event_type, feed_events.type_id, feed_events.data, feed_events.updated_at, user_feed_events.user_id                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
    |         Sort Method:  quicksort  Memory: 17755k|
    |         ->  Hash Join  (cost=3931.63..5836.21 rows=18014 width=389) (actual time=258.541..361.345 rows=29694 loops=1)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
    |               Hash Cond: (user_feed_events.feed_event_id = feed_events.id|
    |               ->  Bitmap Heap Scan on user_feed_events  (cost=926.64..2745.66 rows=18014 width=8) (actual time=6.930..42.367 rows=29694 loops=1)                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
    |                     Recheck Cond: (user_id = ANY ('{1,7,9,8,14,15,20,35,40,39,41,42,57,84,98,109,121,74,129,64,137,77,172,182,206,201,284,31,94,232,311,168,30,114,50,174,419,403,438,464,423,513,351,349,385,622,751,359,809,838,844,962,831,786,896,1001,992,998,990,256,67,623,957,1226,1060,1009,1490,132,1467,1672,619,1459,1466,993,1599,1365,607,1381,1714,1154,2032,2230,2240,2354,598,2345,1804,634,1900,2652,1975,2164,1759,3288,1004,3487,3507,3542,3566,514,3787,3137,3803,3090,4012,855,17,2026,1463,335,1000,935,5,12,10,13,19,18,16,22,34,27,29,59,126,90,46,23,63,291,134,229,107,439,521}'::integer[]))     |
    |                     ->  Bitmap Index Scan on index_user_feed_events_on_user_id  (cost=0.00..925.74 rows=18014 width=0) (actual time=6.836..6.836 rows=29694 loops=1)                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
    |                           Index Cond: (user_id = ANY ('{1,7,9,8,14,15,20,35,40,39,41,42,57,84,98,109,121,74,129,64,137,77,172,182,206,201,284,31,94,232,311,168,30,114,50,174,419,403,438,464,423,513,351,349,385,622,751,359,809,838,844,962,831,786,896,1001,992,998,990,256,67,623,957,1226,1060,1009,1490,132,1467,1672,619,1459,1466,993,1599,1365,607,1381,1714,1154,2032,2230,2240,2354,598,2345,1804,634,1900,2652,1975,2164,1759,3288,1004,3487,3507,3542,3566,514,3787,3137,3803,3090,4012,855,17,2026,1463,335,1000,935,5,12,10,13,19,18,16,22,34,27,29,59,126,90,46,23,63,291,134,229,107,439,521}'::integer[])) |
    |               ->  Hash  (cost=2848.84..2848.84 rows=44614 width=385) (actual time=251.490..251.490 rows=44663 loops=1)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
    |                     ->  Seq Scan on feed_events  (cost=0.00..2848.84 rows=44614 width=385) (actual time=0.035..77.044 rows=44663 loops=1)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
    | Total runtime: 1780.200 ms|

    SQL>

UPDATE #3 : The problem is that for my rails application I'm using the has_many_friends plugin (https://github.com/swemoney/has_many_friends), that is taking care of my friendships.更新#3 :问题是我的rails应用程序我使用has_many_friends插件(https://github.com/swemoney/has_many_friends),这是照顾我的友谊。 It works like this.它是这样工作的。 I'm user_id #6 and I'm asking friendship to user_id # 10. When user # 10 accepts my friendship a new row is added to the table with user_id = 6 and friend_id = 10. If user #10 ask me for friendship the row is: user_id = 10 and friend_id = 6.我是 user_id #6,我向 user_id #10 询问友谊。当用户 #10 接受我的友谊时,表格中会添加一个新行,其中 user_id = 6 和friend_id = 10。如果用户 #10 向我询问友谊行是:user_id = 10 和friend_id = 6。

This means that in order to find friends_by_me I need to search on "user_id = 6", in order to find friends_for_me I need to "friend_id = 6".这意味着为了找到friends_by_me,我需要搜索“user_id = 6”,为了找到friends_for_me,我需要“friend_id = 6”。 In order to find all of my friends I need to search both columns.为了找到我所有的朋友,我需要搜索这两列。 This makes very complicated creating joins?这使得创建连接变得非常复杂? How would you handle this?你会怎么处理这个?

The only alternative I can think of is:我能想到的唯一选择是:

"(SELECT 
DISTINCT feed_events.id, 
feed_events.event_type, 
feed_events.type_id, 
feed_events.data, 
feed_events.created_at, 
feed_events.updated_at, 
user_feed_events.user_id 
FROM feed_events 
INNER JOIN user_feed_events 
ON feed_events.id = user_feed_events.feed_event_id 
INNER JOIN friendships 
ON user_feed_events.user_id = friendships.user_id 
WHERE friendships.user_id = 6 
AND friendships.accepted_at IS NOT NULL)

UNION DISTINCT

(SELECT 
DISTINCT additional_feed_events.id, 
additional_feed_events.event_type, 
additional_feed_events.type_id, 
additional_feed_events.data, 
additional_feed_events.created_at, 
additional_feed_events.updated_at, 
user_feed_events.user_id 
FROM feed_events AS additional_feed_events 
INNER JOIN user_feed_events 
ON additional_feed_events.id = user_feed_events.feed_event_id 
INNER JOIN friendships 
ON user_feed_events.user_id = friendships.friend_id 
WHERE friendships.friend_id = 6 
AND friendships.accepted_at IS NOT NULL) 

ORDER BY feed_events.created_at DESC"

But at the moment is not working and I'm also not sure is the right way to do it!但目前不工作,我也不确定是正确的方法!

Thanks, Augusto谢谢,奥古斯托

Why do you use the IN list?为什么要使用 IN 列表? Why don't you start from the selected user?为什么不从选定的用户开始? Also, I think your left outer join is not needed:另外,我认为不需要您的左外连接:

SELECT 
DISTINCT feed_events.id, 
feed_events.event_type, 
feed_events.type_id, 
feed_events.data, 
feed_events.created_at, 
feed_events.updated_at, 
user_feed_events.user_id  
FROM 
(
  select friend_id from friendship where user_id = YOURUSER
  UNION
  select user_id as friend_id from friendship where friend_id = YOURUSER
) friendship
inner join user_feed_events 
on friendship.friend_id = user_feed_events.user_id
inner join feed_events
on user_feed_events.feed_event_id = feed_events.id
ORDER BY feed_events.created_at DESC

If you want to stay with your original statement and just optimize it, then use this:如果您想保留原始语句并对其进行优化,请使用以下命令:

SELECT 
DISTINCT feed_events.id, 
feed_events.event_type, 
feed_events.type_id, 
feed_events.data, 
feed_events.created_at, 
feed_events.updated_at, 
user_feed_events.user_id  
FROM user_feed_events 
INNER JOIN feed_events 
ON feed_events.id = user_feed_events.feed_event_id 
WHERE user_feed_events.user_id IN (#{friends_id}) 
ORDER BY feed_events.created_at DESC

This removes the unnecessary LEFT JOIN.这删除了不必要的 LEFT JOIN。

Furthermore, please make sure that you created indexes on the columns you use for the foreign keys.此外,请确保您在用于外键的列上创建了索引。

Ok, so the query isn't your problem here, your database must be set up so that this doesn't take any longer than a few microseconds.好的,所以查询不是您的问题,您的数据库必须设置为不会超过几微秒。 First though, the query.首先,查询。 It should look like this:它应该如下所示:

 SELECT feed_events.id, 
        feed_events.event_type, 
        feed_events.type_id, 
        feed_events.data, 
        feed_events.created_at, 
        feed_events.updated_at, 
        user_feed_events.user_id  

   FROM feed_events
            INNER JOIN
        user_feed_events ON feed_events.id = user_feed_events.feed_event_id
            INNER JOIN
        user_friends     ON user_friends.friend_id = user_feed_events.user_id

  WHERE user_friends.user_id = ** The Id of the User in Question **
  ORDER BY feed_events.created_at DESC

Next, you need to make sure your Id columns are primary keys and there are unique indexes on things like (friend_id, user_id) in the user_friends table.接下来,您需要确保您的 Id 列是主键,并且 user_friends 表中的 (friend_id, user_id) 等内容具有唯一索引。 Btw, I just made up those names, I tried to guess what you were calling the table you store friendships in.顺便说一句,我只是编了这些名字,我试图猜测你在叫什么表来存储友谊。

select distinct fe.id, fe.event_type,
       fe.type_id, fe.data, fe.created_at,
       fe.updated_at, ufe.user_id
from friendships as f
    inner join user_feed_events as ufe on f.friend_id = ufe.user_id
    inner join feed_events as fe on ufe.user_id = fe.id
where f.user_id = 6 and f.accepted_at is not null
order by fe.created_at desc

Not sure whether distinct is really needed here.不确定这里是否真的需要 distinct。 Query returns feed events for friends of specified user.. it should I hope;)查询返回指定用户的朋友的提要事件..我应该希望;)

Edit.编辑。 It occur that solution is pretty the same as Daniel Hilgarth proposed.该解决方案与 Daniel Hilgarth 提出的解决方案非常相似。

User a sub- SELECT in the WHERE clause to build a list of feed events for an IN() call.WHERE子句中使用子SELECTIN()调用构建提要事件列表。 Something (untested) like this:像这样的东西(未经测试):

SELECT fe.id, 
    fe.event_type, 
    fe.type_id, 
    fe.data, 
    fe.created_at, 
    fe.updated_at,
    ufe.user_id  
FROM feed_events AS fe, user_feed_events AS ufe
WHERE TRUE = TRUE
    AND fe.id = ufe.feed_event_id
    AND ufe.user_id = :user_id
    AND fe.id IN((
        SELECT ufe.feed_event_id
        FROM user_feed_events AS ufe, user_friends AS uf
        WHERE uf.friend_id = :user_id
    ))
ORDER BY feed_events.created_at DESC;

I'd be curious to see what the EXPLAIN ANALYZE looks like from this.我很想知道EXPLAIN ANALYZE的样子。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM