按 pandas dataframe 中的 KMeans 集群分組

Question

對於 X3 列，我想在 dataframe 下方聚類。 我怎樣才能做到這一點？

 df=pd.DataFrame({'Month':[1,1,1,1,1,1,3,3,3,3,3,3,3],'X1':[10,15,24,32,8,6,10,23,24,56,45,10,56]
   ,'X2':[12,90,20,40,10,15,30,40,60,42,2,4,10],'X3':[34,65,34,87,100,65,78,67,34,98,96,46,76]})

下面是我嘗試過但沒有工作的

cols=df.columns[3]

def cluster(X):
    k_means = KMeans(n_clusters=3).fit(X)
    return X.assign(clusters=k_means.labels_)

df['cluster_id'] = df.groupby('Month')[cols].apply(cluster)

請幫忙謝謝。

Answer 1

KMeans的sklearn通常期望特征是二維數組，而不是您傳遞的一維向量。 所以你需要將你的X修改為一個數組。 此外，如果你想依賴group-by-combine機制，為什么不將列索引放在申請 function 中，因為這樣的操作分配很麻煩。

cols=df.columns[3]
def cluster(X):
    feature = X[cols].to_numpy().reshape((len(X), 1))
    k_means = KMeans(n_clusters=3).fit(feature)
    X['cluster'] = k_means.labels_
    return X
    
df= df.groupby('Month').apply(cluster)

Answer 2

您可以使用GroupBy.transform來形成集群標簽。 您的 function 的更改是：

將傳入的列值重塑為(n_samples, 1)以便sklearn滿意
不將生成的k_means.labels_直接分配給 function 中的任何內容，而是將其返回以進行transform

所以

def cluster(X, n_clusters):
    k_means = KMeans(n_clusters=n_clusters).fit(X.values.reshape(-1, 1))
    return k_means.labels_

cols = pd.Index(["X3"])
df[cols + "_cluster_id"] = df.groupby("Month")[cols].transform(cluster, n_clusters=3)

我們使用pd.Index而不是 Python 列表來簡化將字符串"_cluster_id"添加到cols的每個元素。

要得到

    Month  X1  X2   X3  X3_cluster_id
0       1  10  12   34              1
1       1  15  90   65              0
2       1  24  20   34              1
3       1  32  40   87              2
4       1   8  10  100              2
5       1   6  15   65              0
6       3  10  30   78              2
7       3  23  40   67              2
8       3  24  60   34              0
9       3  56  42   98              1
10      3  45   2   96              1
11      3  10   4   46              0
12      3  56  10   76              2

按 pandas dataframe 中的 KMeans 集群分組

問題描述

2 個解決方案

解決方案1
1 2021-05-24 07:40:47

解決方案2
1 已采納 2021-05-24 08:20:09

按 pandas dataframe 中的 KMeans 集群分組

問題描述

2 個解決方案

解決方案1 1 2021-05-24 07:40:47

解決方案2 1 已采納 2021-05-24 08:20:09

解決方案1
1 2021-05-24 07:40:47

解決方案2
1 已采納 2021-05-24 08:20:09