简体   繁体   中英

How can I make clusters of time frame?

I have a Pandas Dataframe of Time.

0    2020-08-01 23:59:59
1    2020-08-01 23:59:49
2    2020-08-01 20:52:17
3    2020-08-01 19:02:34
4    2020-08-01 18:38:06

I want to add a column where I want to index by making a cluster. For eg. as follows:

0    2020-08-01 23:59:59   1
1    2020-08-01 23:59:49   1
2    2020-08-01 20:52:17   2
3    2020-08-01 19:02:34   3
4    2020-08-01 18:38:06   3

I have written this for this example as we can see 3 clusters can be made, which are the nearest/closest time stamps.

from sklearn.cluster import KMeans
mat = df['datetime'].values
kmeans = KMeans(n_clusters=3)
kmeans.fit(mat.iloc[:,1:])
y_kmeans = kmeans.predict(mat.iloc[:,1:])

df['cluster'] = y_kmeans   

However, the above code also didn't work. Well, I have millions of data and obviously don't know how many clusters should I need to make. I read Elbow Method can be used but not exactly sure how it can be done. Can someone direct how it can be done?

kmeans assumes that you know the number of clusters.

If you want a method that determines the number of clusters algorithmically, you can eg use DBSCAN which forms a cluster whenever a group of data points is "close" to each other (closeness determined by the eps parameter). If you have a large number of samples and this is very costly, you can also try to explore any clusters in the data using a smaller (representative) subset of the data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM