简体   繁体   English

MySQL多行子查询修复

[英]MySQL Multi-Row Sub Query fix

I am trying to return all data, from all users, before a date/time which is unique to each user. 我试图在每个用户唯一的日期/时间之前从所有用户返回所有数据。

I can return all the data for the exact date for each user, no problem, but I don't know how to expand my query to return dates less than or equal to my key date. 我可以为每个用户返回确切日期的所有数据,没问题,但是我不知道如何扩展查询以返回小于或等于关键日期的日期。

I have a 2 table SQL database. 我有一个2表SQL数据库。

Table 1 - "sales" has a confirmation record of all transactions. 表1-“销售”具有所有交易的确认记录。

  • user_id (date, and then various purchase information) user_id(日期,然后是各种购买信息)

Table 2 - "user_data" has a record of all browsing data 表2-“ user_data”具有所有浏览数据的记录

  • user_id, record_date, purchase_made (purchase_made is misnamed, in that they triggered the purchase popup, but may not have actually made a purchase -- and then a whole bunch of user action data) user_id,record_date,purchase_made(purchase_made的名称错误,因为它们触发了购买弹出窗口,但实际上可能没有进行购买-然后是一大堆用户操作数据)

I need to get all the user_data records of users who have made purchases, prior to and including their initial purchase. 我需要获取所有购买用户的所有user_data记录,包括首次购买之前(包括首次购买)。

In order to get the exact purchase records I can use the following: 为了获得确切的购买记录,我可以使用以下方法:

SELECT * 
FROM user_data 
WHERE user_id IN (SELECT user_id FROM sales) 
AND record_date IN (SELECT min(record_date) FROM user_data WHERE purchase_made = 1 GROUP BY user_id);

Now all I really want is to be able to change that second "IN" to "<=" but I can't do that with a Subquery that returns more than 1 row. 现在,我真正想要的就是能够将第二个“ IN”更改为“ <=”,但是我无法通过返回多于1行的子查询来实现。

Disclaimer -- I didn't write the database, I don't have permission to change the database (nor would I be the right person to do so) 免责声明-我没有写数据库,我没有更改数据库的权限(我也不适合这样做)

Disclaimer 2 -- I know that using purchase_made will foul up the results a bit because not all of these will be purchases. 免责声明2-我知道使用purchase_made会使结果有些混乱,因为并非所有这些都是购买。 But because of the reporting, the date from the sales table can be off by enough time that it is actually LESS reliable than using the timestamp associated with purchase_made (proven through extensive query comparisons run on the actual data set). 但是由于有报表,销售表中的日期可能要花足够长的时间才能比使用与purchase_made关联的时间戳(实际上是通过对实际数据集进行的大量查询比较证明)可靠得多。 But the mandate from above is that it doesn't matter, they'll be happy with "mostly" accurate. 但是从上面的要求是,这并不重要,他们会对“大部分”的准确性感到满意。

"Now all I really want is to be able to change that second "IN" to "<=" “现在,我真正想要的是能够将第二个“ IN”更改为“ <=

So make it a correlated subquery by correlating the records we eliminate the need for the "IN" and now can use <= 因此,通过关联记录使其成为关联的子查询,我们不再需要“ IN”,现在可以使用<=

SELECT * 
FROM user_data UD
WHERE user_id IN (SELECT user_id FROM sales) 
  AND record_date <= (SELECT MIN(record_date) 
                      FROM user_data UD2 
                      WHERE UD2.purchase_made = 1 
                        AND UD2.User_ID = UD.User_ID);

Another approach would be to add user_Id to the subquery and instead of a subquery make it a inline view that you join to based on the user_ID and then on the join criteria compare the dates. 另一种方法是将user_Id添加到子查询中,而不是使子查询成为您基于user_ID加入的内联视图,然后根据加入条件比较日期。

SELECT * 
FROM 
    user_data UD
INNER JOIN 
   (SELECT MIN(record_date) mrd, User_ID 
    FROM user_data 
    WHERE purchase_made = 1
    GROUP BY User_ID) UD2 ON UD.user_Id = UD2.user_Id
                          AND UD.record_date <= UD2.RecordDate
WHERE 
    user_id IN (SELECT user_id FROM sales) 

I would consider changing the where clause to... 我会考虑将where子句更改为...

WHERE 
    user_ID EXISTS (SELECT NULL FROM sales S WHERE S.user_ID = UD.user_ID)

I suspect sales table to get larger and larger and IN will degrade in performance over time, where as exists can early exist the subquery resulting in an improved performance. 我怀疑sales表会越来越大,并且IN会随着时间的推移而降低性能,因为子查询早就存在,从而导致性能提高。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM