I would like to prepare a market basket analysis in Python based on Google Analytics data. I would like to examine what the most common paths the user goes through, and on a cookie level. I have encountered two problems: first, when I query the data from BigQuery, the hit number is on a session level and not on a cookie level. How would I be able to show the path a user has gone through (on a cookie and not on a session level)? Second, I do not know how to tweak the data: in R, a transaction class is needed for preparing the data to the apriori algorithm. I know that in Python the solution is to one hot encode the data, however, my problem is that through this solution, the sequence of page paths are lost.
Could somebody please help me? Thank you!
I think you best bet for aggregating page_paths
at a cookie
level would be to group by visitor_id
. The visitor_id
is what is assigned by GA as the cookie
and should persist through visits unless a user goes incognito or clears cookies. Depending if you are using a Custom Dimension to track users logging on
to your website, you will see that a user could have multiple visitor_id
s.
Before you aggregate up you can combine all this information by using visit_id
to distinguish between different sessions. You can query all hit level data for a given a user and then roll up from there.
I think this could be done by adjusting the WHERE
clause in your query in how you're querying the hit level of the session now, keeping the hit number
but now you're looking at all sessions.
SELECT
fullVisitorId,
visitId,
visitNumber,
hits.hitNumber AS hitNumber,
hits.page.pagePath AS pagePath
FROM
TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') )
WHERE
hits.type="PAGE"
ORDER BY
fullVisitorId,
visitId,
visitNumber,
hitNumber
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.