I have this data frame : basically each row being a transaction carried out by one customer on a day. there are multiple transactions by same customer on same day and on different dates. I want to get a column for a customers number of previous visits.
id date purchase
id1 date1 $10
id1 date1 $50
id1 date2 $30
id2 date1 $10
id2 date1 $10
id3 date3 $10
after adding visits column:
id date purchase visit
id1 date1 $10 0
id1 date1 $50 0
id1 date2 $30 1
id2 date1 $10 0
id2 date2 $10 1
id2 date3 $10 2
I do this in pandas using factorize :
df.visits = 1
df.visits = df.groupby('id')['date'].transform(lambda x: pd.factorize(x)[0])
I want to do it through SQL, what would the query be like ?
You need DENSE_RANK()
with PARTITION BY
:
Creation of example dataset:
IF OBJECT_ID('Source', 'U') IS NOT NULL
DROP TABLE Source;
CREATE TABLE Source
(
id varchar(30),
Date varchar(30),
purchase varchar(30)
)
INSERT INTO Source
VALUES
('id1', 'date1', '$10'),
('id1', 'date1', '$50'),
('id1', 'date2', '$30'),
('id2', 'date1', '$10'),
('id2', 'date2', '$10'),
('id2', 'date3', '$10')
SELECT *,
DENSE_RANK() OVER (PARTITION BY id ORDER BY date) - 1 AS visit
FROM Source
Output
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.