简体   繁体   中英

How to cluster a time series using KMeans in python

So I have a data in the form [UID obj1 obj2..] x timestamp and I want to cluster this data in python using kmeans from sklearn. Where should I start?

EDIT:

So basically I'm trying to cluster users based on clickstream data, and classify them based on usage patterns.

You can add more features based on the raw data, and using methods like RFM Analysis. RFM = recency, frequency, monetary

For example:

How often the user logged in?

The last time the user logged in?

You can use Python library Retentioneering ( github ), which allows you to cluster your users based on clickstream data with a simple command. You can also specify any target events you are interested in your clusters and explore obtained graphs using interactive graphs.

data.rete.get_clusters(method='kmeans',
                   feature_type='tfidf',
                   n_clusters=8,
                   ngram_range=(1,2),
                   plot_type='cluster_bar',
                   targets=['payment_done','cart']);

results of user clustering

Next you can explore obtained behavioral clusters with interactive graph:

clus_0 = data.rete.filter_cluster(0)
clus_0.rete.plot_graph(thresh=0.1,
                   weight_col='user_id',
                   targets = {'lost':'red',
                              'payment_done':'green'})

graph visualization example

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM