How can I make clusters of time frame?

Question

I have a Pandas Dataframe of Time.

0    2020-08-01 23:59:59
1    2020-08-01 23:59:49
2    2020-08-01 20:52:17
3    2020-08-01 19:02:34
4    2020-08-01 18:38:06

I want to add a column where I want to index by making a cluster. For eg. as follows:

0    2020-08-01 23:59:59   1
1    2020-08-01 23:59:49   1
2    2020-08-01 20:52:17   2
3    2020-08-01 19:02:34   3
4    2020-08-01 18:38:06   3

I have written this for this example as we can see 3 clusters can be made, which are the nearest/closest time stamps.

from sklearn.cluster import KMeans
mat = df['datetime'].values
kmeans = KMeans(n_clusters=3)
kmeans.fit(mat.iloc[:,1:])
y_kmeans = kmeans.predict(mat.iloc[:,1:])

df['cluster'] = y_kmeans

However, the above code also didn't work. Well, I have millions of data and obviously don't know how many clusters should I need to make. I read Elbow Method can be used but not exactly sure how it can be done. Can someone direct how it can be done?

Answer 1

kmeans assumes that you know the number of clusters.

If you want a method that determines the number of clusters algorithmically, you can eg use DBSCAN which forms a cluster whenever a group of data points is "close" to each other (closeness determined by the eps parameter). If you have a large number of samples and this is very costly, you can also try to explore any clusters in the data using a smaller (representative) subset of the data.

How can I make clusters of time frame?

Question

1 answers

solution1
0 2020-09-13 14:43:55

How can I make clusters of time frame?

Question

1 answers

solution1 0 2020-09-13 14:43:55

solution1
0 2020-09-13 14:43:55