简体   繁体   中英

Google Analytics pagepath market basket analysis in Python

I would like to prepare a market basket analysis in Python based on Google Analytics data. I would like to examine what the most common paths the user goes through, and on a cookie level. I have encountered two problems: first, when I query the data from BigQuery, the hit number is on a session level and not on a cookie level. How would I be able to show the path a user has gone through (on a cookie and not on a session level)? Second, I do not know how to tweak the data: in R, a transaction class is needed for preparing the data to the apriori algorithm. I know that in Python the solution is to one hot encode the data, however, my problem is that through this solution, the sequence of page paths are lost.

Could somebody please help me? Thank you!

I think you best bet for aggregating page_paths at a cookie level would be to group by visitor_id . The visitor_id is what is assigned by GA as the cookie and should persist through visits unless a user goes incognito or clears cookies. Depending if you are using a Custom Dimension to track users logging on to your website, you will see that a user could have multiple visitor_id s.

Before you aggregate up you can combine all this information by using visit_id to distinguish between different sessions. You can query all hit level data for a given a user and then roll up from there.

I think this could be done by adjusting the WHERE clause in your query in how you're querying the hit level of the session now, keeping the hit number but now you're looking at all sessions.

SELECT
fullVisitorId,
visitId,
visitNumber,
hits.hitNumber AS hitNumber,
hits.page.pagePath AS pagePath
FROM
TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') )
WHERE
hits.type="PAGE"
ORDER BY
fullVisitorId,
visitId,
visitNumber,
hitNumber  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM