[英]SQL Server: Generate unique customer key based on two columns
我正在清理电子商务网站上的客户列表。 客户列表在客户 ID 和客户电子邮件之间存在多对多关系。 例如,客户可以在登录或匿名时使用相同的电子邮件下订单,结果将是两个具有相同电子邮件但不同客户 ID 的客户记录。 同样,客户可以在登录时使用两个不同的电子邮件创建订单,这将导致客户记录具有相同的 ID 但电子邮件不同。 鉴于此,我想根据电子邮件或客户编号创建一个具有真正唯一 ID 的客户列表。 另外,也有邮件为空的情况,所以如果客户记录都是空邮件但ID不同,就需要考虑两个不同的客户。
所以给出这样的东西:
CUST_ID CUST_EMAIL
------------------------
123 test1@gmail.com
123 test2@gmail.com
124 test3@gmail.com
125 test3@gmail.com
126
127
128 test4@gmail.com
128 test5@gmail.com
129 test4@gmail.com
我想生成这样的密钥:
CUST_ID CUST_EMAIL NEW_CUST_KEY
------------------------------------
123 test1@gmail.com 1
123 test2@gmail.com 1
124 test3@gmail.com 2
125 test3@gmail.com 2
126 3
127 4
128 test4@gmail.com 5
128 test5@gmail.com 5
129 test4@gmail.com 5
OLDTABLE - 你的表是 NEWTABLE - 会有结果
CREATE TABLE #NEWTABLE
(
NEW_CUST_KEY int not null ,
CUST_ID int not null,
CUST_EMAIL nvarchar(100) null
)
------------------------------------
insert into #NEWTABLE (NEW_CUST_KEY,CUST_ID,CUST_EMAIL)
SELECT ROW_NUMBER() OVER(ORDER BY CUST_ID, CUST_EMAIL) AS NEW_CUST_KEY, CUST_ID, CUST_EMAIL
FROM
(
SELECT CUST_ID, CUST_EMAIL
FROM OLDTABLE
GROUP BY CUST_ID, CUST_EMAIL
) T
UPDATE Upd SET NEW_CUST_KEY = T.NEW_CUST_KEY
FROM #NEWTABLE Upd
join (
SELECT CUST_ID, min(NEW_CUST_KEY) AS NEW_CUST_KEY
FROM #NEWTABLE
GROUP BY CUST_ID) T
on Upd.CUST_ID = T.CUST_ID
UPDATE Upd SET NEW_CUST_KEY = T.NEW_CUST_KEY
FROM #NEWTABLE Upd
join (
SELECT CUST_EMAIL, min(NEW_CUST_KEY) AS NEW_CUST_KEY
FROM #NEWTABLE
GROUP BY CUST_EMAIL) T
on nullif(Upd.CUST_EMAIL,'') = nullif(T.CUST_EMAIL,'')
UPDATE Upd SET NEW_CUST_KEY = T.CHANGE_CUST_KEY
FROM #NEWTABLE Upd
join (
SELECT NEW_CUST_KEY, ROW_NUMBER() OVER(ORDER BY NEW_CUST_KEY) AS CHANGE_CUST_KEY
FROM #NEWTABLE
GROUP BY NEW_CUST_KEY) T
on Upd.NEW_CUST_KEY = T.NEW_CUST_KEY
select * from #NEWTABLE
我想你可以使用 row_number ..... 这样的东西......
SELECT DISTINCT CUST_ID, CUST_EMAIL
ROW_NUMBER() OVER(PARTITION BY CUST_ID, CUST_EMAIL) AS New_Cust_Key
FROM YOUR TABLES
我试图将您的用户及其 ID 映射到他们的电子邮件,反之亦然,因此我创建了这个 Frankenstein 怪物查询:
DECLARE @Customers TABLE
(
CUST_ID INT
, CUST_EMAIL VARCHAR(20)
);
INSERT INTO @Customers (CUST_ID, CUST_EMAIL)
VALUES (123, 'test1@gmail.com')
, (123, 'test2@gmail.com')
, (124, 'test3@gmail.com')
, (125, 'test3@gmail.com')
, (126, '')
, (127, '')
, (128, 'test4@gmail.com')
, (128, 'test5@gmail.com')
, (129, 'test4@gmail.com');
SELECT DISTINCT C.CUST_ID
, C.CUST_EMAIL
, DENSE_RANK() OVER(ORDER BY T.CUST_ID) AS NEW_CUST_KEY
FROM @Customers AS C
INNER JOIN (
SELECT CUST_ID, CUST_EMAIL
FROM @Customers
EXCEPT
SELECT C2.CUST_ID, C2.CUST_EMAIL
FROM @Customers AS C1
INNER JOIN @Customers AS C2
ON C2.CUST_EMAIL = C1.CUST_EMAIL
AND C2.CUST_ID > C1.CUST_ID
AND C1.CUST_EMAIL <> ''
) AS T
ON CASE
WHEN (T.CUST_ID = C.CUST_ID AND T.CUST_EMAIL = C.CUST_EMAIL AND T.CUST_EMAIL = '') THEN 1
WHEN (T.CUST_ID = C.CUST_ID OR T.CUST_EMAIL = C.CUST_EMAIL) AND T.CUST_EMAIL <> '' THEN 1
ELSE 0
END = 1;
使用它产生的测试数据,它似乎确实符合您的期望:
╔═════════╦═════════════════╦═══════════════╗
║ CUST_ID ║ CUST_EMAIL ║ NEW_CUST_KEY ║
╠═════════╬═════════════════╬═══════════════╣
║ 123 ║ test1@gmail.com ║ 1 ║
║ 123 ║ test2@gmail.com ║ 1 ║
║ 124 ║ test3@gmail.com ║ 2 ║
║ 125 ║ test3@gmail.com ║ 2 ║
║ 126 ║ ║ 3 ║
║ 127 ║ ║ 4 ║
║ 128 ║ test4@gmail.com ║ 5 ║
║ 128 ║ test5@gmail.com ║ 5 ║
║ 129 ║ test4@gmail.com ║ 5 ║
╚═════════╩═════════════════╩═══════════════╝
您可以在data.stackexchange.com上在现实生活中看到这一点
让我知道这是否适用于您的实际数据库。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.