简体   繁体   中英

MySQL Multi-Row Sub Query fix

I am trying to return all data, from all users, before a date/time which is unique to each user.

I can return all the data for the exact date for each user, no problem, but I don't know how to expand my query to return dates less than or equal to my key date.

I have a 2 table SQL database.

Table 1 - "sales" has a confirmation record of all transactions.

  • user_id (date, and then various purchase information)

Table 2 - "user_data" has a record of all browsing data

  • user_id, record_date, purchase_made (purchase_made is misnamed, in that they triggered the purchase popup, but may not have actually made a purchase -- and then a whole bunch of user action data)

I need to get all the user_data records of users who have made purchases, prior to and including their initial purchase.

In order to get the exact purchase records I can use the following:

SELECT * 
FROM user_data 
WHERE user_id IN (SELECT user_id FROM sales) 
AND record_date IN (SELECT min(record_date) FROM user_data WHERE purchase_made = 1 GROUP BY user_id);

Now all I really want is to be able to change that second "IN" to "<=" but I can't do that with a Subquery that returns more than 1 row.

Disclaimer -- I didn't write the database, I don't have permission to change the database (nor would I be the right person to do so)

Disclaimer 2 -- I know that using purchase_made will foul up the results a bit because not all of these will be purchases. But because of the reporting, the date from the sales table can be off by enough time that it is actually LESS reliable than using the timestamp associated with purchase_made (proven through extensive query comparisons run on the actual data set). But the mandate from above is that it doesn't matter, they'll be happy with "mostly" accurate.

"Now all I really want is to be able to change that second "IN" to "<="

So make it a correlated subquery by correlating the records we eliminate the need for the "IN" and now can use <=

SELECT * 
FROM user_data UD
WHERE user_id IN (SELECT user_id FROM sales) 
  AND record_date <= (SELECT MIN(record_date) 
                      FROM user_data UD2 
                      WHERE UD2.purchase_made = 1 
                        AND UD2.User_ID = UD.User_ID);

Another approach would be to add user_Id to the subquery and instead of a subquery make it a inline view that you join to based on the user_ID and then on the join criteria compare the dates.

SELECT * 
FROM 
    user_data UD
INNER JOIN 
   (SELECT MIN(record_date) mrd, User_ID 
    FROM user_data 
    WHERE purchase_made = 1
    GROUP BY User_ID) UD2 ON UD.user_Id = UD2.user_Id
                          AND UD.record_date <= UD2.RecordDate
WHERE 
    user_id IN (SELECT user_id FROM sales) 

I would consider changing the where clause to...

WHERE 
    user_ID EXISTS (SELECT NULL FROM sales S WHERE S.user_ID = UD.user_ID)

I suspect sales table to get larger and larger and IN will degrade in performance over time, where as exists can early exist the subquery resulting in an improved performance.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM