So I have a data in the form [UID obj1 obj2..] x timestamp and I want to cluster this data in python using kmeans from sklearn. Where should I start?
EDIT:
So basically I'm trying to cluster users based on clickstream data, and classify them based on usage patterns.
You can add more features based on the raw data, and using methods like RFM Analysis. RFM = recency, frequency, monetary
For example:
How often the user logged in?
The last time the user logged in?
You can use Python library Retentioneering ( github ), which allows you to cluster your users based on clickstream data with a simple command. You can also specify any target events you are interested in your clusters and explore obtained graphs using interactive graphs.
data.rete.get_clusters(method='kmeans',
feature_type='tfidf',
n_clusters=8,
ngram_range=(1,2),
plot_type='cluster_bar',
targets=['payment_done','cart']);
Next you can explore obtained behavioral clusters with interactive graph:
clus_0 = data.rete.filter_cluster(0)
clus_0.rete.plot_graph(thresh=0.1,
weight_col='user_id',
targets = {'lost':'red',
'payment_done':'green'})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.