简体   繁体   中英

SQL: Suggested friends with 1 degree of separation where my friends share more than 2 mutual friends

I'm having a problem writing a query to suggest friends to my end users when their friends share more than one friend in common. Currently the schema thats being used is far from optimal, but my boss is adamant that I am not allowed to alter the table structure, even though I've told him that providing 2 columns for a friend relationship is much faster than one column

We currently have one pair of values for each friendship:

friendID  |  Entity_ID1  |  Entity_Id2
   1              2             3
   2              1             4
   3              2             5

Where I know that having an inverse for this column would make my query much simpler. So far I have devised the following query to attempt to find suggested friends for a user:

  SELECT DISTINCT Entity_Id, Fb_Id, First_Name, Last_Name, Profile_Pic_Url, Last_CheckIn_Place, Category
  FROM entity
  JOIN friends F1
  ON entity.Entity_Id = F1.Entity_Id2 OR entity.Entity_Id = F1.Entity_Id1
  /* Friends of Friends */
  WHERE F1.Entity_Id2 IN
  (
    SELECT Entity_Id1
      FROM friends F
     WHERE F.Entity_Id2 = :userId
       AND F.Category != 4

     UNION

     SELECT Entity_Id2
      FROM friends F
     WHERE F.Entity_Id1 = :userId
       AND F.Category != 4
  )
  /* Exclude my friends */
  AND F1.Entity_Id1 NOT IN
  (
    SELECT Entity_Id1
      FROM friends F
     WHERE F.Entity_Id2 = :userId
       AND F.Category != 4

     UNION

     SELECT Entity_Id2
      FROM friends F
     WHERE F.Entity_Id1 = :userId
       AND F.Category != 4
  )
  /* Exclude self */
  AND F1.Entity_Id1 != :userId
  GROUP BY Entity_Id

  /* Perform again for userId 2 */
  UNION

  SELECT DISTINCT Entity_Id, Fb_Id, First_Name, Last_Name, Profile_Pic_Url, Last_CheckIn_Place, Category
  FROM entity
  JOIN friends F2
  ON entity.Entity_Id = F2.Entity_Id2 OR entity.Entity_Id = F2.Entity_Id1
  WHERE F2.Entity_Id1 IN
  (
    SELECT Entity_Id1
      FROM friends F
     WHERE F.Entity_Id2 = :userId
       AND F.Category != 4

     UNION

     SELECT Entity_Id2
      FROM friends F
     WHERE F.Entity_Id1 = :userId
       AND F.Category != 4
  )
  /* Exclude my friends */
  AND F2.Entity_Id2 NOT IN
  (
    SELECT Entity_Id1
      FROM friends F
     WHERE F.Entity_Id2 = :userId
       AND F.Category != 4

     UNION

     SELECT Entity_Id2
      FROM friends F
     WHERE F.Entity_Id1 = :userId
       AND F.Category != 4
  )
  AND F2.Entity_Id2 != :userId
  GROUP BY Entity_Id

This sort of works, however it returns users that I am already friends with which is not what I want, I thought by having the NOT IN() clause for my friends, and then using UNION to merge the results, this would strip my friends out but apparently it does not.

What am I doing wrong here, and is there any way to make this query shorter without modifying the schema, right now it seems far to long and rather un-manageable.

Missing the reciprocal relationship does make this much harder. It requires checking both directions of the relationship. You seem to be pursing aa strategy of using union to reconstruct both sides of the relationship.

Alternatively, you can use exists and subqueries. The following version finds entities that are not friends and that have at least two friends in common using exists :

select e.*
from entities e
where e.entity_id <> :user_id and
      not exists (select 1
                  from friends f
                  where f.category <> 4 and
                        :user_id in (f.entity_id1, f.entity_id2) and
                        e.entity_id in (f.entity_id1, f.entity_id2)
                 ) and
      (select count(*)
       from friends f1 join
            friends f2
            on f1.entity_id1 = f2.entity_id1 or
               f1.entity_id1 = f2.entity_id2 or
               f1.entity_id2 = f2.entity_id1 or
               f1.entity_id1 = f2.entity_id2
       where :user_id in (f1.entity_id1, f1.entity_id2, f2.entity_id1, f2.entity_id2) and
             e.entity_id in (f1.entity_id1, f1.entity_id2, f2.entity_id1, f2.entity_id2)
      ) >= 2

Hopefully, you don't have too much data. Neither this version nor the version you are attempting will have good performance on larger amounts of data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM