简体   繁体   English

sql查询做pandas factorise。 一组后的累积总和?

[英]Sql query to do pandas factorise. cumulative sum after a group by?

I have this data frame : basically each row being a transaction carried out by one customer on a day. 我有这个数据框:基本上每一行都是一个客户在一天内完成的交易。 there are multiple transactions by same customer on same day and on different dates. 同一客户在同一天和不同日期进行多笔交易。 I want to get a column for a customers number of previous visits. 我想为客户提供以前访问次数的列。

id  date   purchase 

id1 date1  $10    

id1 date1  $50    

id1 date2  $30     

id2 date1  $10     

id2 date1  $10     

id3 date3  $10     

after adding visits column: 添加访问列后:

id  date   purchase  visit

id1 date1  $10         0 

id1 date1  $50         0

id1 date2  $30         1

id2 date1  $10         0

id2 date2  $10         1

id2 date3  $10         2 

I do this in pandas using factorize : 我在使用factorize的pandas中这样做:

df.visits = 1 
df.visits = df.groupby('id')['date'].transform(lambda x: pd.factorize(x)[0]) 

I want to do it through SQL, what would the query be like ? 我想通过SQL来做,查询会是什么样的?

You need DENSE_RANK() with PARTITION BY : 你需要DENSE_RANK()PARTITION BY

Creation of example dataset: 创建示例数据集:

IF OBJECT_ID('Source', 'U') IS NOT NULL 
  DROP TABLE Source; 

CREATE TABLE Source
(
  id varchar(30),
  Date varchar(30),
  purchase varchar(30)
)

INSERT INTO Source
VALUES
('id1', 'date1', '$10'),   
('id1', 'date1', '$50'),   
('id1', 'date2', '$30'),    
('id2', 'date1', '$10'),   
('id2', 'date2', '$10'),  
('id2', 'date3', '$10')

SELECT *, 
  DENSE_RANK() OVER (PARTITION BY id ORDER BY date) - 1 AS visit
FROM Source

Output 产量

产量

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM