week cookie
1 a
1 b
1 c
1 d
2 a
2 b
3 a
3 c
3 d
This table represent someone visits a website in a particular week. Each cookie represents an individual person. Each entry represent someone visit this site in a particular week. For example, the last entry means 'd' come to the site in week 3.
I want to find out how many (same) people keep coming back in the following week, when given a start week to look at.
For example, if I look at week 1. I will get result like:
1 | 4
2 | 2
3 | 1
Because 4 user came in week 1. Only 2 of them (a,b) came back in week 2. Only 1 (a) of them came in all of these 3 weeks.
How can I do a select query to find out? The table will be big: there might be 100 weeks, so I want to find the right way to do it.
This query uses variables to track adjacent weeks and work out if they are consecutive:
set @start_week = 2, @week := 0, @conseq := 0, @cookie:='';
select conseq_weeks, count(*)
from (
select
cookie,
if (cookie != @cookie or week != @week + 1, @conseq := 0, @conseq := @conseq + 1) + 1 as conseq_weeks,
(cookie != @cookie and week <= @start_week) or (cookie = @cookie and week = @week + 1) as conseq,
@cookie := cookie as lastcookie,
@week := week as lastweek
from (select week, cookie from webhist where week >= @start_week order by 2, 1) x
) y
where conseq
group by 1;
This is for week 2. For another week, change the start_week
variable at the top.
Here's the test:
create table webhist(week int, cookie char);
insert into webhist values (1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'c'), (3, 'd');
Output of above query with where week >= 1
:
+--------------+----------+
| conseq_weeks | count(*) |
+--------------+----------+
| 1 | 4 |
| 2 | 2 |
| 3 | 1 |
+--------------+----------+
Output of above query with where week >= 2
:
+--------------+----------+
| conseq_weeks | count(*) |
+--------------+----------+
| 1 | 2 |
| 2 | 1 |
+--------------+----------+
ps Good question, but a bit of a ball-breaker
For some reason most of these answers are very over complicated, it doesn't need cursors or for loops or anything of the sort...
I want to find out how many (same) people keep coming back in the following week, when given a start week to look at.
If you want to know how many users for any week visited one week and then the week after for each future week:
SELECT visits.week, COUNT(1) AS [NumRepeatUsers]
FROM visits
WHERE EXISTS (
SELECT TOP 1 1
FROM visits AS nextWeek
WHERE nextWeek.week = visits.week+1
AND nextWeek.cookie = visits.cookie
)
AND EXISTS (
SELECT TOP 1 1
FROM visits AS searchWeek
WHERE searchWeek.week = @week
AND nextWeek.cookie = visits.cookie
)
GROUP BY visits.week
ORDER BY visits.week
However this will not show you diminishing results over time if you have 10 users in week 1, and then 5 different users visited for the next 5 weeks you would keep seeing 1=10,2=5,3=5,4=5,5=5,6=5 and so on, instead you want to see that 5=x where x is the number of users who visited every week for 5 weeks straight. To do this, see below:
SELECT visits.week, COUNT(1) AS [NumRepeatUsers]
FROM visits
WHERE EXISTS (
SELECT TOP 1 1
FROM visits AS nextWeek
WHERE nextWeek.week = visits.week+1
AND nextWeek.cookie = visits.cookie
)
AND EXISTS (
SELECT TOP 1 1
FROM visits AS searchWeek
WHERE searchWeek.week = @week
AND nextWeek.cookie = visits.cookie
)
AND visits.week - @week = (
SELECT COUNT(1) AS [Count]
FROM visits AS searchWeek
WHERE searchWeek.week BETWEEN @week+1 AND visits.week
AND nextWeek.cookie = visits.cookie
)
GROUP BY visits.week
ORDER BY visits.week
This will give you 1=10,2=5,3=4,4=3,5=2,6=1 or the like
This is an interesting one.
I try to work out when was the final week each person visited.
This is calculated as the first week on or after the start where the following week doesn't have a visit.
Once you know each user's final visiting week you just count up, for every week, the number of different users whose final visit was on or after that week.
SELECT wks.week, COUNT(cookie) as Visitors
FROM (SELECT a.cookie, MIN(a.week) AS FinalVisit
FROM WeekVisits a
INNER JOIN WeekVisits FirstWeek
ON a.cookie = FirstWeek.cookie
WHERE a.week >= 1
AND FirstWeek.week = 1
AND NOT EXISTS (SELECT 1
FROM WeekVisits b
WHERE b.week = a.week + 1
AND b.cookie = a.cookie)
GROUP BY a.cookie) fv
INNER JOIN
(SELECT DISTINCT week
FROM WeekVisits
WHERE week >= 1) wks
ON fv.FinalVisit >= wks.week
GROUP BY wks.week
ORDER BY wks.week
EDIT
-Thanks ypercube for noticing. I had also lost the group by from the "fv" query. Oops.
-I've removed the comments denoting parameters.
-I've removed the unnecessary distinct.
EDIT again
-Added in a extra stuff for FirstWeek because it didn't cope with starting on week 2
When I run this (admittedly on MS Access)
starting week 1 I get:
+------+----------+ | week | Visitors | | 1 | 4 | | 2 | 2 | | 3 | 1 | +------+----------+
starting week 2 I get:
+------+----------+ | week | Visitors | | 2 | 2 | | 3 | 1 | +------+----------+
.. as expected.
(To start on week 2 you would change the 1 to 2 in the three places where it is compared with the week column)
The method seems sound but the syntax may need adjusting for MySQL.
Okay let's say your table is called visits
and you are interested in week number n
. You want to know, for every week number w >= n
, which users appear in every single such week w
.
So how many such weeks are there?
select count(*)
from visits
where week >= n;
And in how many such weeks did each user visit?
select user, count(user)
from visit
group by user
where week >= n;
Suppose you have weeks 1, 3, 4, 5, 6, 7, 9, 10, and 13, and you are interested in week 5. So the first query above gives you 6, because there are 6 weeks of interest: 5, 6, 7, 9, 10, and 13. The second query will give you, for each user, how many of those weeks they visited in. Now you want to know for how many of those users the count is 6.
I think this works:
select user, count(user)
from visit
group by user
having count(user) = (
select count(*)
from visits
where week >= n)
where week >= n;
but I don't have access to MySQL right now. If it doesn't work, then perhaps the approach makes some sense and sets you in the right direction. EDIT: I will be able to test tomorrow.
Use self-join:
SELECT ... FROM visits AS v1 LEFT JOIN visits AS v2 ON v2.week = v1.week+1
WHERE v2.week IS NOT NULL
GROUP BY cookie
This will give you records of second and later visits.
But I think that better would be just to GROUP BY cookie
which can get you number of visits per cookie; any number above 1 is a returning user.
This is my solution, is not really straightforward but -as I have tested- it does solve your problem:
First we declare a stored procedure that will give us the visitor in a particular week separated by strings, you can use group_concat if you wish, but I did this way -take into account that group_concat has a text limit.
DELIMITER $$
DROP PROCEDURE IF EXISTS `db`.`get_visitors_for_week`$$
CREATE DEFINER=`root`@`localhost` PROCEDURE `get_visitors_for_week`(id_week INTEGER, OUT result TEXT)
BEGIN
DECLARE should_continue INT DEFAULT 0;
DECLARE c_cookie CHAR(1);
DECLARE r CURSOR FOR SELECT v.cookie
FROM visits v WHERE v.week = id_week;
DECLARE CONTINUE HANDLER FOR NOT FOUND
SET should_continue = 1;
OPEN r;
REPEAT
SET c_cookie = NULL;
FETCH r INTO c_cookie;
IF c_cookie IS NOT NULL THEN
IF result IS NULL OR result = '' THEN
SET result = c_cookie;
ELSE SET result = CONCAT(result,',',c_cookie);
END IF;
END IF;
UNTIL should_continue = 1
END REPEAT;
CLOSE r;
END$$
DELIMITER ;
Then we declare a function to wrap that stored procedure, so we can call inside a query conveniently:
DELIMITER $$
DROP FUNCTION IF EXISTS `db`.`concat_values`$$
CREATE DEFINER=`root`@`localhost` FUNCTION `concat_values`(id_week INTEGER) RETURNS TEXT CHARSET latin1
BEGIN
DECLARE result TEXT;
CALL get_visitors_for_week(id_week, result);
RETURN result;
END$$
DELIMITER ;
And then we must count the visitors that has come this week and last week -for each week of course-, we 'see' that by searching for our cookie string in the concatenated list. This is the final query:
SELECT
v.week,
SUM(IF(IFNULL(concat_values(v.week - 1)) OR INSTR(concat_values(v.week - 1),v.cookie) > 0, 1, 0)) AS Visitors
FROM (SELECT
v.week,
v.cookie,
vt.visitors
FROM visits v
INNER JOIN (SELECT DISTINCT
v.week,
concat_values(v.week) AS visitors
FROM visits v) AS vt
ON v.week = vt.week) AS v
WHERE v.week >= 1
GROUP BY v.week
Substitue the condition v.week >= 1
-the 1- for the week number you want to start from.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.