[英]Iterate over rows using SQL
I have a table in a Redshift-database containing event-data. 我在Redshift数据库中有一个包含事件数据的表。 Each row is one event.
每行是一个事件。 Every event have eventid, but not sessionid that I now need.
每个事件都有eventid,但现在没有我需要的sessionid。 I have extracted a sample of the table (a subset of columns and only events from one userid):
我提取了该表的样本(列的子集,并且仅来自一个userid的事件):
time userid eventid sessionstart sessiontop
1498639773 101xnmnd1ohi62 504747459 t f
1498639777 101xnmnd1ohi62 1479311450 f f
1498639803 101xnmnd1ohi62 808610184 f f
1498639816 101xnmnd1ohi62 335000637 f f
1498639903 101xnmnd1ohi62 238269920 f f
1498639906 101xnmnd1ohi62 990687838 f f
1498639952 101xnmnd1ohi62 781472797 f t
1498650109 101xnmnd1ohi62 1826568537 t f
1498650124 101xnmnd1ohi62 2079795673 f f
1498650365 101xnmnd1ohi62 578922176 f t
This is ordered by userid and time, so that the events are displayed in correct order, according to session activity. 这是按照用户ID和时间排序的,以便根据会话活动以正确的顺序显示事件。 Every event has a boolean value for sessionstart and sessionstop.
每个事件的sessionstart和sessionstop都有一个布尔值。 By looking at the list of events I can identify the sessions by finding all events within (and including) sessionstart=true and sessionstop=true.
通过查看事件列表,我可以通过找到sessionstart = true和sessionstop = true(包括)内的所有事件来标识会话。 In the events listed here, there are two sessions.
在此处列出的事件中,有两个会话。 First session starts with eventid 504747459 and ends with 781472797. Second session starts with eventid 1826568537 and ends with 578922176. What I want to do is mark these two sessions (and all other sessions) with a sessionid, using SQL.
第一个会话以事件ID 504747459开始,并以781472797结束。第二个会话以事件ID 1826568537开始,并以578922176结束。我想做的是使用SQL使用SessionID标记这两个会话(以及所有其他会话)。 I haven't found any way to do this using SQL.
我还没有找到使用SQL的任何方法。 It will be possible using eg.
可以使用例如。 Python, but I believe the performance will be very poor.
Python,但我相信性能会很差。 Therefore SQL is preferred.
因此,SQL是首选。
Does anyone have a tip to how I can solve this? 有没有人提示我如何解决这个问题?
I think it might be easier just to use sessionstart
-- assuming that there are no events in-between as session start and session end. 我认为仅使用
sessionstart
可能会更容易-假设在session start和session end之间没有事件发生。
If so: 如果是这样的话:
select e.*
sum(case when sessionstart then 1 else 0 end) over (partition by userid order by time) as user_sessionid
from events e;
This provides a sessionid "within" each user. 这提供了每个用户“内部”的sessionid。 If users always start with a new session (a reasonable assumption), then this is easily extended to a global session id:
如果用户始终以新的会话开始(合理的假设),则可以轻松地将其扩展为全局会话ID:
select e.*
sum(case when sessionstart then 1 else 0 end) over (order by userid, time) as user_sessionid
from events e;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.