[英]Oracle: How to find overlaps in rows
Suppose I have the following table: 假设我有下表:
User_ID Activity_ID
123 222
123 333
124 222
124 224
124 333
125 224
125 333
I want to return a count users by the different combinations of overlaps such as the following: 我想通过重叠的不同组合来返回一个计数用户,例如:
Activity_ID_1 Activity_ID_2 Count_of_Users
222 333 2
222 224 2
In the above example, there are 2 users who completed both 223 AND 333. 在上面的示例中,有2位用户同时完成223和333。
I do not want to define each combination manually since there are 93 different activity_ids I am working with. 我不想手动定义每个组合,因为我正在使用93个不同的activity_id。 Is there a way to do this purely in Oracle SQL?
有没有一种方法可以完全在Oracle SQL中做到这一点?
Assuming you have an activity
table with activity id's, and you want to count only DISTINCT users who had the same two activities (the same user having both activities twice wouldn't count): 假设您有一个带有活动ID的
activity
表,并且您只想统计具有两个相同活动的DISTINCT用户(同一用户同时具有两个活动两次将不计算在内):
select a1.activity_id, a2.activity_id, count(distinct f.user_id)
from activity a1 inner join facts f on a1.activity_id = f.activity_id
inner join activity a2 on a2.activity_id = f.activity_id
where a1.activity_id < a2.activity_id
group by a1.activity_id, a2.activity_id
having count(distinct f.user_id) >= 2
;
facts
is the name of your facts table (the one you show in your question). facts
是事实表的名称(在问题中显示的事实表)。
EDIT: If the facts
table (or view or subquery or whatever) is already "distinct"-ed by user_id, then delete "distinct" from my solution; 编辑:如果
facts
表(或视图或子查询或任何东西)已经被user_id“区别”了,那么从我的解决方案中删除“ distinct”; this will make it more efficient. 这将使其更有效率。 NOTE: "distinct" appears twice, once in SELECT and again in HAVING.
注意:“ distinct”出现两次,一次出现在SELECT中,另一次出现在HAVING中。
Oracle Setup : Oracle安装程序 :
CREATE TABLE data ( User_ID, Activity_ID ) AS
SELECT 123, 222 FROM DUAL UNION ALL
SELECT 123, 333 FROM DUAL UNION ALL
SELECT 124, 222 FROM DUAL UNION ALL
SELECT 124, 224 FROM DUAL UNION ALL
SELECT 124, 333 FROM DUAL UNION ALL
SELECT 125, 224 FROM DUAL UNION ALL
SELECT 125, 333 FROM DUAL;
CREATE TYPE INTLIST AS TABLE OF INT;
/
Query : 查询 :
WITH Activities ( User_IDs, Activity_ID ) AS (
SELECT CAST( COLLECT( User_ID ) AS INTLIST ),
Activity_ID
FROM data
GROUP BY Activity_ID
)
SELECT a.Activity_ID,
b.Activity_ID,
CARDINALITY( a.User_IDs MULTISET INTERSECT b.User_IDs ) AS "Count"
FROM Activities a
INNER JOIN
Activities b
ON ( CARDINALITY( a.User_IDs MULTISET INTERSECT b.User_IDs ) > 1
AND a.Activity_ID < b.Activity_ID );
Output : 输出 :
ACTIVITY_ID ACTIVITY_ID Count
----------- ----------- ----------
222 333 2
224 333 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.