简体   繁体   English

SQL基于每行一个字段在另一列中计算重复项

[英]SQL count duplicates in another column based on one field per row

I am building out a customer retention report. 我正在建立客户保留报告。 We identify customers by their email. 我们通过他们的电子邮件识别客户。 Here is some sample data from our table: 这是我们表中的一些示例数据:

+----------------------------+------------------+-------------------+---------------------+------------+-------------+--------------+---------------+------------------+---------------+----------------+--------------+------------------+--+--+--+--+--+
|           Email            | BrandNewCustomer | RecurringCustomer | ReactivatedCustomer | OrderCount | TotalOrders | Date_Created | Customer_Name | Customer_Address | Customer_City | Customer_State | Customer_Zip | Customer_Country |  |  |  |  |  |
+----------------------------+------------------+-------------------+---------------------+------------+-------------+--------------+---------------+------------------+---------------+----------------+--------------+------------------+--+--+--+--+--+
| zyw@marketplace.amazon.com |                1 |                 0 |                   0 |          1 |           1 | 41:50.0      | Sha           |              990 | BRO           | NY             |          112 | US               |  |  |  |  |  |
| zyu@gmail.com              |                1 |                 0 |                   0 |          1 |           1 | 57:25.0      | Zyu           |              181 | Mia           | FL             |          330 | US               |  |  |  |  |  |
| ZyR@aol.com                |                1 |                 0 |                   0 |          1 |           1 | 10:19.0      | Day           |              581 | Myr           | SC             |          295 | US               |  |  |  |  |  |
| zyr@gmail.com              |                1 |                 0 |                   0 |          1 |           1 | 25:19.0      | Nic           |              173 | Was           | DC             |          200 | US               |  |  |  |  |  |
| zy@gmail.com               |                1 |                 0 |                   0 |          1 |           1 | 19:18.0      | Kim           |              675 | MIA           | FL             |          331 | US               |  |  |  |  |  |
| zyou@gmail.com             |                1 |                 0 |                   0 |          1 |           1 | 40:29.0      | zoe           |              160 | Mob           | AL             |          366 | US               |  |  |  |  |  |
| zyon@yahoo.com             |                1 |                 0 |                   0 |          1 |           1 | 17:21.0      | Zyo           |              84  | Sta           | CT             |          690 | US               |  |  |  |  |  |
| zyo@gmail.com              |                1 |                 0 |                   0 |          2 |           2 | 02:03.0      | Zyo           |              432 | Ell           | GA             |          302 | US               |  |  |  |  |  |
| zyo@gmail.com              |                1 |                 0 |                   0 |          1 |           2 | 12:54.0      | Zyo           |              432 | Ell           | GA             |          302 | US               |  |  |  |  |  |
| zyn@icloud.com             |                1 |                 0 |                   0 |          1 |           1 | 54:56.0      | Zyn           |              916 | Nor           | CA             |          913 | US               |  |  |  |  |  |
| zyl@gmail.com              |                0 |                 1 |                   0 |          3 |           3 | 31:27.0      | Ser           |              123 | Mia           | FL             |          331 | US               |  |  |  |  |  |
| zyk@marketplace.amazon.com |                1 |                 0 |                   0 |          1 |           1 | 44:00.0      | Myr           |              101 | MIA           | FL             |          331 | US               |  |  |  |  |  |
+----------------------------+------------------+-------------------+---------------------+------------+-------------+--------------+---------------+------------------+---------------+----------------+--------------+------------------+--+--+--+--+--+

We define our customer by email. 我们通过电子邮件定义客户。 So all orders with the same email are marked to be under one customer and then we do calculations on top of that. 因此,所有具有相同电子邮件的订单都标记为在一个客户下,然后我们在此基础上进行计算。

Now I am trying to find out about customers whose emails have changed. 现在,我试图找出有关电子邮件已更改的客户的信息。 So to do this we will try to line up customers by their address. 因此,为此,我们将尝试按客户的地址排队。

So per each row (so when separated by email), I want to have another column called something like Orders_With_Same_Address_Different_Email. 因此,对于每一行(因此当用电子邮件分隔时),我希望有另一列称为“ Orders_With_Same_Address_Different_Email”。 How would I do that? 我该怎么做?

I have tried doing something with Dense Rank but it doesn't seem to work: 我已经尝试过使用Dense Rank做一些事情,但是似乎没有用:

SELECT DISTINCT
Email
,BrandNewCustomer
,RecurringCustomer
,ReactivatedCustomer
,OrderCount
,TotalOrders
,Date_Created
,Customer_Name
,Customer_Address
,Customer_City
,Customer_State
,Customer_Zip
,Customer_Country
,(DENSE_RANK() over (partition by Email order by (case when email <> email then Customer_Address end)  asc) 
+DENSE_RANK() over ( partition by Email order by (case when email <> email then Customer_Address end)  desc) 
- 1) as Orders_With_Same_Name_Different_Email
--*
FROM Customers

Try counting the email partitioned by address, not by email: 尝试计算按地址而不是按电子邮件划分的电子邮件:

select   Email,
         -- ...

         Orders_With_Same_Name_Different_Email = iif(
             (count(email) over (partition by Customer_Address) > 1, 
         1, 0)

from     Customers;

But this is a lesson in why you wouldn't use an email as an identifier for a client. 但这是为什么您不将电子邮件用作客户端标识符的一课。 Address is a bad idea as well. 地址也是一个坏主意。 Use something that won't change. 使用不会改变的东西。 That usually means making an internal identifier, such as something that auto-increments: 这通常意味着制作一个内部标识符,例如自动递增的标识符:

alter table #customers
add customerId int identity(1,1) primary key not null

Now customerId = 1 will always refer to that particular customer. 现在,customerId = 1将始终引用该特定客户。

You can group by customer_address and check the count. 您可以按customer_address分组并检查计数。 This is by the assumption that each customer has one address. 假设每个客户都有一个地址。

   Select * from table where 
  customer_address IN (
  Select customer_address
  From table group by customer_address
  having count(distinct customer_email) 
   >1) 

If I understand what you want to do, this is how I would solve it: 如果我了解您想做什么,这就是我的解决方法:

Note, you don't need the having clause in the CTE but depending on your data it could make it faster. 请注意,您不需要CTE中的hading子句,但是根据您的数据,它可能会使它更快。 (That is, if you have a large dataset.) (也就是说,如果您的数据集很大。)

WITH email2addr
(
  select email, count(distinct customer_address) as addr_cnt
  from customers
  group by email
  having count(distinct customer_address) > 1
)

SELECT 
    Email
    ,BrandNewCustomer
    ,RecurringCustomer
    ,ReactivatedCustomer
    ,OrderCount
    ,TotalOrders
    ,Date_Created
    ,Customer_Name
    ,Customer_Address
    ,Customer_City
    ,Customer_State
    ,Customer_Zip
    ,Customer_Country
    CASE when coalese(email2addr.addr_cnt,1) > 1 then 'Y' ELSE 'N' END as has_more_than_1_email 
from customers
left join email2addr on customers.email = email2addr.email

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM