[英]Flag gaps and new entries in time series database - customer attrition & new customers
I am trying to flag new customers and attritted customers in my database.我正在尝试在我的数据库中标记新客户和流失的客户。 The objective is to have a pretty simple flat table where I can just pull "new customers" and "lost customers" for a given business and a given year.
目标是有一个非常简单的平面表,我可以在其中为给定的业务和给定的年份拉“新客户”和“失去的客户”。
I have a table that looks like this:我有一张看起来像这样的表:
BUSINESS, CUSTOMER, YEAR
Business X, Customer A, 2001
Business X, Customer A, 2002
Business X, Customer A, 2003
Business X, Customer B, 2004
Business X, Customer B, 2005
Business Y, Customer A, 2004
And I would like to put two new columns in my table such that I flag if a customer is "NEW" in that year, or is "GONE" the next year for that business line.我想在我的表中添加两个新列,以便我标记该客户在那一年是“新”客户,还是该业务线下一年“已消失”。 So the end result should look like this:
所以最终结果应该是这样的:
BUSINESS, CUSTOMER, YEAR, NEW, GONE
Business X, Customer A, 2001, NEW, NULL
Business X, Customer A, 2002, NULL, NULL
Business X, Customer A, 2003, NULL, GONE
Business X, Customer B, 2004, NEW, NULL
Business X, Customer B, 2005, NULL, GONE
Business Y, Customer A, 2004, NEW, NULL
Thanks so much in advance for your help.非常感谢您的帮助。 I am working on this in SQL but also in Google Cloud Dataprep, and am a terrible coder and very open to brute force techniques!!
我正在 SQL 和 Google Cloud Dataprep 中处理这个问题,我是一个糟糕的编码员,对蛮力技术非常开放!!
One solution is possible using Correlated Subqueries , with Exists()
condition.一种解决方案是使用Correlated Subqueries和
Exists()
条件。
In the first subquery, we determine if any previous YEAR
exists for a particular BUSINESS
, CUSTOMER
and YEAR
combination.在第一个子查询中,我们确定特定
BUSINESS
、 CUSTOMER
和YEAR
组合是否存在任何前YEAR
。 If it exists()
, we set the NEW
to NULL
(since already another row exists before this year).如果它
exists()
,我们将NEW
设置为NULL
(因为在今年之前已经存在另一行)。
In the second subquery, we determine if any later YEAR
exists for a particular BUSINESS
, CUSTOMER
and YEAR
combination.在第二个子查询中,我们确定特定
BUSINESS
、 CUSTOMER
和YEAR
组合是否存在任何较晚的YEAR
。 If it exists()
, we set the GONE
to NULL
(since already another row exists after this year).如果它
exists()
,我们将GONE
设置为NULL
(因为今年之后已经存在另一行)。
SELECT
t1.BUSINESS,
t1.CUSTOMER,
t1.YEAR,
IF ( EXISTS(SELECT 1
FROM your_table AS t2
WHERE t2.BUSINESS = t1.BUSINESS AND
t2.CUSTOMER = t1.CUSTOMER AND
t2.YEAR < t1.YEAR
LIMIT 1), NULL, 'NEW' ) AS NEW,
IF ( EXISTS(SELECT 1
FROM your_table AS t3
WHERE t3.BUSINESS = t1.BUSINESS AND
t3.CUSTOMER = t1.CUSTOMER AND
t3.YEAR > t1.YEAR
LIMIT 1), NULL, 'GONE' ) AS GONE
FROM your_table AS t1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.