简体   繁体   中英

Is this complex query possible in MySQL or do I need PHP?

I'm planning a db-driven website that matches users based on how they answer questions. I'm thinking the best approach is to run the match calculations in the SELECT query, but I have no idea how to write the query.

Let say I have a table called user_answer and it looks like this:

+--------+-------------+--------+------------------+--------+
| userid | question_id | answer | preferred_answer | weight |
+--------+-------------+--------+------------------+--------+
| 1      | 20          | 3      |                  | 0      |
| 1      | 24          | 3      | 2, 3             | 1      |
| 1      | 36          | 2      | 2                | 10     |
| 1      | 37          | 3      | 1, 2, 3          | 50     |
| 1      | 40          | 3      | 3                | 250    |
| 2      | 20          | 3      | 3                | 10     |
| 2      | 24          | 3      | 2                | 1      |
| 2      | 25          | 2      |                  | 0      |
| 2      | 26          | 2      |                  | 0      |
| 2      | 40          | 3      | 2                | 250    |
+--------+-------------+--------+------------------+--------+

I want to select and order by match_percentage - match_percentage shoud be calculated this way:

  1. Given userid = 1 ( current_user )
  2. select users with matching question_id's ( match_user userid = 2)
  3. total_weight1 = sum of weight of matching question_id 's for current_user
  4. if answer of match_user is in current_user preferred_answer , match1_weight = match1_weight + weight of current_user
  5. total_weight2 = sum of weight of matching question_id 's for match_user
  6. if answer of current_user is in match_user preferred_answer , match2_weight = match2_weight + weight of match_user
  7. match_percentage = sqrt(( match1_weight / total_weight1 ) * ( match2_weight / total_weight2 ))

I don't know if this is possible. I'm expecting the DB to grow to be very large, so loading them all and doing the calculations in PHP may not be the best choice - but correct me if I'm wrong.

Is it possible to make all these calculations in a query?

Yes, I believe all the specified calculations can be performed in a query.

Assuming that (userid, questionid) is UNIQUE, we start with finding userid with "matching" questions. We could do that with a query like this:

SELECT u.answer
     , u.preferred_answer
     , u.weight
     , m.userid           AS m_userid
     , m.question_id      AS m_question_id
     , m.answer           AS m_answer
     , m.preferred_answer AS m_preferred_answer
     , m.weight           AS m_weight
  FROM user_answer u
  JOIN user_answer m
    ON m.question_id = u.question_id
   AND m.userid <> u.userid
   AND u.userid = 1 
 ORDER
    BY m.userid
     , m.question_id

Once we have that working, we can work on getting the total weights and the calculations from those.

Assuming the preferred_answer column is VARCHAR type, and contains a comma separated list of elements, with no spaces, eg '2' , or '2,3,5' , you could use the MySQL FIND_IN_SET function to return the index position of a particular element within the list. And that will return 0 if a "match" is not found.

I believe this query meets the specification.

SELECT m.userid           AS m_userid
     , SUM(u.weight)      AS total_weight1
     , SUM(IF(FIND_IN_SET(m.answer,u.preferred_answer),u.weight,0)) AS match1_weight
     , SUM(m.weight)      AS total_weight2
     , SUM(IF(FIND_IN_SET(u.answer,m.preferred_answer),m.weight,0)) AS match2_weight
     , SQRT(
         ( SUM(IF(FIND_IN_SET(m.answer,u.preferred_answer),u.weight,0)) / SUM(u.weight) )
       * ( SUM(IF(FIND_IN_SET(u.answer,m.preferred_answer),m.weight,0)) / SUM(m.weight) )
       ) AS match_percentage
  FROM user_answer u
  JOIN user_answer m
    ON m.question_id = u.question_id
   AND m.userid <> u.userid
   AND u.userid = 1 
 GROUP
    BY m.userid
 ORDER
    BY match_percentage DESC

NOTE:

These queries are desk checked only. I didn't set up a SQL Fiddle to test.

Item 4 appears to be a total of current_user weight , but only including matching answers. If there are no matching answer, we're going to return 0. Same for item 6, but just inverse.)

If there are no matching questions between userid 1 and some other userid, then no row will be returned for the other userid.

For a large set, this could potentially crank for a while. Suitable covering indexes should improve performance.

For improved query performance, you may want to consider "caching" the result of this query into a separate table. The contents of the "cache" table would only need to be refreshed if a row in the original table was inserted, updated, deleted. And the previously calculated results might still be "good enough" for normal access.

If you stored the results, you'd want to also return u.userid as a column in the SELECT list and GROUP BY.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM