简体   繁体   English

如何使用 Pearson 1-r 作为距离度量在 Python 中运行 kmeans 集群?

[英]How do I run a kmeans cluster in Python using a Pearson 1-r as a distance measure?

Kmeans uses Euclidean as a default distance measure. Kmeans 使用欧几里得作为默认距离度量。 How do I use a Pearson 1-R as my distance measure?如何使用 Pearson 1-R 作为距离测量仪?

Here is code to create the dataset I am using to cluster这是创建我用来聚类的数据集的代码

import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
    
df = [[-18.57,9.52,-8.21,8.78,-2.36,-7.05,-17.03,-1.1,-6.89,-2.97,-3.3,9.34,-4.73,-20.79,2.62,0.38],[-10.39,-5.93,-6.21,-0.69,-0.85,-0.75,14.78,12.05,-13.98,18.3,17.19,-3.98,-11.17,0.68,-11,-15.82],[8.22,11.68,16.48,11.85,14.33,1.97,8.11,22.31,2.02,15.7,2.1,6.65,15.61,-1.2,0.43,1.87],[-6.45,-14.31,2.83,-13.21,3.7,5.54,6.6,-9.83,-7.25,8.75,10.04,-5.07,3.48,-5.19,4.61,-7.7],[13.68,2.96,6.57,-4.24,11.77,3.56,14.09,20.19,9.28,3.74,13.66,13.64,-3.29,-12.48,-20.91,-1.42],[28.64,4.72,3.46,17.37,3.11,17.24,1.53,-25.74,-0.57,-21.77,-17.68,12.43,-1.45,-5.21,0.3,3.12],[16.56,9.54,16.8,4.39,6.71,0.43,3.3,12.89,8.52,27.47,15.04,11.11,3.37,-6.14,3.59,2.81],[-40.77,13.79,-21,-37.82,-50.74,-17.13,-5.3,3.55,-17.08,-29.07,-4.07,-14.25,-3.58,-71.49,-9.25,-47],[-39.49,-25.46,-30.24,-56.55,-13.54,-52.72,-17.55,-4.96,-28.14,6.16,-8.53,-13.89,-13.99,-10.18,-11.39,-43.91],[21.1,1.99,8.08,29.73,24.42,19.85,-32.15,-8.57,-6.13,-20.23,7.98,-11.71,23.97,-16.09,-9.97,8.21],[-7.9,26.81,-8.44,29.9,7.22,29.51,28.51,-1.4,9.59,29.11,-10.89,12.81,-6.73,6.19,9.3,3.95],[15.27,-22.13,37,11.33,33.54,1.89,22.28,22.86,16.57,2.56,-30.32,-3.09,5.07,-16.67,22.85,-17.75],[5.96,-11.2,10.22,2.38,0.07,-2.64,10.06,6.68,9.6,4.12,4.42,0.08,4.71,-1.35,4.99,-9.42],[0,0,0,0,0,0,-0.01,0,0,-0.01,0,0,0,-0.01,-0.01,-0.01],[-4.74,18.97,39.63,21.08,29.1,1.27,-4.46,3.89,10.54,10.6,8.34,1.38,6.22,14.52,17.05,-0.08],[-3.24,-14.88,2.16,-11.61,-9.81,-14.37,-17.09,12.07,3.42,-5.72,-5.48,-10.48,-6.66,-8.06,-13.81,-0.21],[12.22,7.54,31.69,10.12,25.69,41.43,5.7,5.45,7.08,8.77,19.17,-3.6,-3.06,18.45,21.46,20.19],[16.83,-1.17,-13.07,-16.95,8.9,5.94,5.91,-44.14,-13.96,10.38,4.96,-4,-4.51,-6.55,6.48,-6.27],[0.67,-3.42,1.65,-2,-6.9,-0.66,-0.11,-2.74,-1.85,-2.75,-1.06,0.94,5.46,0.4,4.5,0.28],[-24.39,5.54,-24.26,12.7,3.24,15.8,-3.91,21.23,2.36,-8.67,-28.88,-2.56,33.99,-37.64,-11.71,49.47],[14.94,11.97,5.72,15.14,-8.13,11.24,8.87,-3.93,3.49,6.73,-0.59,11.51,2.01,4.15,2.28,1.15],[13.32,8.01,21.58,-6.09,-4.85,1.33,19.33,10.86,1.4,9.54,14.72,-4.08,15.87,0.74,6.64,-1.82],[0.04,-0.38,-43.2,21.39,21.33,-10.31,-13.36,-14.64,6.46,-26.01,-7.2,24.12,-18.89,21.96,-14.02,-2.15],[-0.99,-20.83,-24.37,3.25,8.02,-20.09,-0.01,7.74,3.74,-2.28,4.69,-3.64,11.76,2.37,0.57,3.81],[14.61,76.96,-17.45,35.03,-11.66,22.17,81.62,41.09,26.51,113.77,33.79,21.93,85.3,66.51,27.12,79.69],[8.63,7.96,-4.87,-18.66,-32.52,-30.18,20.01,22.05,11.71,22.74,-2.08,6.48,16.52,-11.13,-13.15,-10.51],[-21.46,-2.54,3.9,-1.91,-5.77,-6.35,-4.88,3.54,-4.26,6.8,-3.41,-6.57,4.59,-7.24,-2.3,0.78],[0.01,0.02,0.01,0.01,0.01,0.01,0.01,0,0.01,0,0.01,0.01,0,0,0,0],[13.48,-40.87,6.93,7.04,6.5,-47.92,3.19,0.61,11.81,1.52,9.15,-2.59,9.33,3.62,-1.98,-3.34],[11.59,51.87,-46.24,-47.84,-23.78,-30.11,-28.11,-27.29,-53.84,-34.16,-18.56,-37.88,2.06,1.64,-20.32,-8.18],[-13.05,13.46,7.18,32.84,15.86,-1.05,27.07,41.66,35.41,18.79,19.64,11.75,-0.2,-0.2,-11.77,13.54],[-16.57,25.67,14.41,-19.52,15.72,-4.23,-8.55,5.17,-20.68,-6.25,-11.73,-0.41,7.76,5.11,13,1.12],[-1.7,6.04,9.95,2.33,-3.41,-8.21,-2.71,-12.26,-2.84,-9.86,-7.89,-5.53,0.6,-13.63,-1.87,0.3],[13.34,7.19,-80.45,-26.28,13,22.6,-46,10.27,-13.04,-82.17,-67.98,-58.56,-55.26,-47.02,-47.34,-69.9],[34.21,-1.48,-46.19,13.45,17.61,-17.42,-4.38,-3.55,9.06,2.21,-11.01,-9.1,18.9,19.7,-6.63,-9.39],[9.25,-10.48,-12.27,12.28,17.97,-7.23,-40.1,-0.23,-11.43,-6.45,4.56,-7.29,19.55,29.88,-7.88,-1.26],[18.79,8.92,7.82,17.79,14.14,9.06,8.06,-3.74,-13.73,-6.35,10.72,-0.16,1.57,-13.75,-3.6,0.74],[-1.13,-0.89,-2.07,2.01,-1.15,0.04,-6.81,-7.11,-5.01,-9.74,-3.05,-7.13,-6.15,-3.81,-2.75,-7.35],[10.23,4.95,1.57,13.4,14.48,7.45,1.24,11.21,-9.81,13.97,5.08,7.18,11.04,-0.86,14.34,12.32],[55.8,10.99,11.57,17.2,17.3,19.14,-15.69,-31.38,12.06,-13.02,55.31,-28.04,60.37,-13.24,48.88,35.15],[8.63,-8.41,3.74,-2.86,-24.96,-10.55,-26.28,-5.56,18,-6.03,5.86,-25.2,9.92,19.6,11,-28.53],[-10.16,-8.44,-15.74,-19.8,15.89,-21.25,-39.72,8.26,-2.72,-10.25,-1.09,-5.7,-8.65,-35.39,-25.59,-20.3],[0.01,0,0.01,0.01,0.01,0,-0.01,0,-0.01,0,0,0,0,0,0,0],[16.17,12.83,-19.15,15.76,14.76,-17.06,3.93,-1.58,1.41,-6.73,9.76,7.61,-4.91,-4.24,7.03,-18.56],[-60.2,32.39,19.37,-67.14,-60.57,-78.21,-6.32,-22.76,-25.68,-10.23,-20.87,-26.51,-22.37,-10.01,-16.48,17.87],[21.21,-11.51,15.82,20.46,18.54,-5.08,17.08,-6.48,10.89,4.36,2.18,0.06,1.04,1,3.76,4.19],[5.43,0.96,2.11,9.72,3.48,3.71,1.71,2.56,1.86,-2.55,3.66,-10.26,-6.51,-0.35,-8.71,-1.27],[61.57,18.02,19.57,19.6,-12.3,24.53,-4.21,45.69,10.23,25.56,-8.95,48.96,23.45,27.48,17.44,24.11],[8.06,31.4,-11.45,4.95,-1.67,24.64,1.46,10.26,-1.19,14.72,9.43,5.65,-4.6,0.47,-12.54,26.99],[33.03,46.74,4.12,16.93,20.06,3.79,27.1,14.62,17.03,0.83,27.22,37.44,18.77,-8.64,28.01,46.72],[20.47,-1.63,13.98,18.03,34.63,-3.58,2.81,15.76,5.06,1.93,5.69,15.91,-7.13,-2.34,-10.34,4.65],[4.05,18.22,10.98,-2.32,-23.29,1.4,-3.13,8.06,18.36,-5.57,-9.27,1.18,5.9,13.45,-6.41,-4.72],[0,0.01,0.01,0.01,0,0,0,0,0,0.01,0,0,0.01,0.01,0,0],[0.26,-6.33,-13.29,-4.43,-9.18,-10.85,3.93,-8.68,2.87,-3.03,-5.37,5.9,-1.1,-4.66,-7.53,-4.7],[-9.3,2.8,-14.15,2.74,4.22,-22.25,20.12,19.42,6.45,23.61,18.08,29.85,-1.56,5.5,-3.08,11.77],[17.64,22.46,9.11,0.76,15.12,13.83,10.38,8.96,8.25,8.18,-3.35,13.55,10.67,9.8,3.97,5.27],[12.94,13.88,-21.7,16.84,19.28,-12.12,10.07,49.67,41.56,25.51,61.08,49.79,4.54,36.31,37.6,4.01],[5.58,-1.96,9.13,1.65,3.01,2.11,1.71,10.5,4.63,-0.17,-8.77,-0.62,6.93,8.36,16.22,15.07],[-20.68,20.62,24.43,27.04,-32.43,-18.51,-14.19,-26.4,31.41,-7.56,-22.96,-8.94,-6.12,-36.15,-45.81,1.42],[7.2,8.79,-20.42,11.68,0.55,-75.51,-10.58,-9.71,-18.38,-24.24,-42.46,-15.57,12.14,-14.51,-2.98,15.12],[-26.89,-33.28,24.64,-59.11,-47.77,-26.66,2.83,-39.99,-25.83,21.13,-14.3,-30.86,-22.53,-56.17,-34.8,-45.64],[-12.28,30.44,30.88,-24.51,-32.08,9.78,-27.91,-11.13,-34.77,-12.93,-12.15,-28.02,-6.37,-13.78,-12.84,-27.83],[1.7,-7.1,-0.84,-12.24,-5.13,8.77,12.3,4.85,13.77,-1.39,7.01,12.57,0.28,-0.14,-3.64,-0.9],[-0.47,11.99,4.51,2.46,1.39,-24.4,20.08,23.82,29.97,36.06,8.7,18.47,6.38,-3.18,11.64,-17.1],[-27.61,-7.38,2.21,-11.68,-6.41,-11.74,10.12,8.07,7.86,-7.17,-6.29,14.38,-10.37,-30.33,-47.19,-11.98],[8.65,13.64,-15.22,4.58,10.96,26.26,-2.85,-6.44,-0.85,4.49,-14.29,2.16,11.69,-6.92,-1.01,-21.14],[-1.95,3.67,-25.82,-21.81,-21.11,-47.44,23.96,13.47,2.45,13.66,-3.16,2.91,-7.36,-9.85,25.83,-7.39],[22.62,0.82,-21.09,30.31,-8.33,-44.71,27.17,12.66,14.21,34.05,21.33,22.24,26.45,24.36,3.77,41.94],[-17.92,-7.44,-15.61,-19.85,-6.97,-14.22,-8.59,-8.89,-29,-17.77,-3.2,-16.44,-6.98,-2.87,-18.88,-20.35],[-0.35,-3.41,-2.33,-1.94,-4.34,-3.55,1.74,-1.45,5.18,5.67,6.52,6.04,0.64,0.78,-0.13,3.09],[11.73,10.42,46.98,11.51,-11.17,-26.32,6.09,2.12,1.82,-14,-4.37,3.31,-21.81,-28.53,-10.56,-43.69],[24.23,-42.59,33.48,-54.32,-57.6,-12.85,-8.09,-7.93,-14.93,-4.29,-12.43,-18.05,11.09,0.65,-4.94,-2.79],[12.19,0.39,23.18,53.83,6.23,-23.02,-57.15,3.6,-27.82,13.48,-7.3,-36.85,14.3,-7.77,-20.13,-5.76],[-38.24,1.24,21.2,14.62,3.56,-9.05,-2.59,-6.15,-1.37,-0.23,3.33,-5.62,-13.03,-18.48,3.9,2.89],[-33.71,-12.24,-9.28,-24.85,-42.5,-21.21,12.21,27.77,-7.97,-9.28,-15.5,-19.15,-16.41,-27.15,-23.34,-18.88],[-1.61,15.61,-4.87,0.9,6.35,2.05,4.23,11.15,1.48,3.78,5.32,5.48,-7.3,3.69,3.46,-1.2],[15.14,5.88,-5.54,0.05,-3.69,-2.45,3.32,5.2,-8.32,-6.08,1.9,-0.16,-7.04,-3.66,1.64,-4.04],[-1.11,-0.65,-14.1,5.75,-15.76,-18.43,0.66,-11.11,-16.79,-4.55,-15.45,3.31,-38.51,-6.38,-12.77,-10.05],[-6.34,-17.25,-12.04,-20.64,-1.92,-11.47,-28.58,-24.49,-17.71,-15.78,-19.29,-33.82,13.74,-11.79,12.85,10.34],[-10.29,3.84,-32.88,-32.93,-30,-4.7,-4.32,-21.49,-45.84,3.35,-16.18,-6.53,-2.3,-7.35,-3.64,-9.16],[-46.92,-2.36,-4.28,2.84,-15.83,-29.97,20.77,6.51,29.68,3.2,-5.16,-10.39,9.16,16.48,5.3,13.95],[13.01,-40.22,32.27,5.63,37.17,-56.47,-16.05,-17.26,-21.8,-8.92,-3.98,-7.31,-6.07,-0.01,-2.06,2.04],[21.69,21.66,34.16,11.93,23.6,27.84,2.62,3.98,-3.79,8.24,25.62,-4.28,11.11,14.4,13.57,0.62],[-5.67,29.84,26.73,3.57,8.37,6.88,-8.18,-4.41,8.79,9.4,11.87,6.42,2.27,-0.38,7.05,2.52],[-8.74,54.09,47.21,-35.65,-52.82,-41.96,25.13,1.77,15.75,5.18,-12.33,14.95,4.07,-12.57,9.97,5.6],[29.21,-12.92,-11.2,-10.47,-12.31,-8.17,-2.83,-8.98,-10.44,-3.55,-6.5,0.99,-1.74,-10.88,0.92,2.4],[0.01,0,0,0,0.01,0,-0.01,0,0,0,0,0,0,0,0,-0.01],[-11.94,-13.01,-33.93,-43.53,-43.97,-41.08,26.01,14.4,39.87,-1.71,15.39,-1.83,6.48,12.93,1.12,29],[1.13,-3.85,-3.35,-2.46,1.14,-3.27,7.31,4.23,13.38,-4.91,6.95,9.76,1.1,8.73,4.16,-5],[21.42,15.73,7.2,5.94,13.61,-19.66,8.54,-4.47,19.25,13.81,-10.73,4.22,18.25,17.23,28.46,11.97],[11.99,10.46,-6.35,20.49,20.93,-7.78,-2.78,14.79,11.05,12.37,0.46,2.28,0.17,-11.8,5.78,-8.59],[-8.57,-31.61,1.1,8.31,-11.16,-1.87,15.49,-15.97,2.61,4.16,-1.87,-16.33,-3.76,-7.44,2.06,28.74],[0.11,-17.11,18.2,64.64,21.97,-27.5,-11.85,4.58,-2.84,-8.18,-25.06,-26.06,10.45,-7.56,-9.64,10.6],[14.99,13,67.29,-7.09,16.44,-23.13,17.07,47.12,22.95,-11.13,-23.63,18.28,15.61,38.36,-52.48,53.09],[-64.78,-4.27,-2.07,-46.84,-61.44,-52.13,-7.36,-19.99,-17.42,-2.59,-11.06,-22.44,-9.7,-35.45,-20.95,-0.19],[-34.46,-47.12,-35.12,-13.34,5.65,-65.27,-5.57,-33.78,-29.28,-33.81,3.26,-12.11,-41.51,-26.73,-18.81,-28.69],[-21.99,-42.13,65.12,6.57,0.6,-12,57.67,36.86,48.95,27.01,39.84,19.77,5.45,19.08,7.78,53.13],[20.87,-14.93,-15.89,3.81,-3.77,-14.89,-7.4,0.26,4.5,-8.21,5.59,10.55,7.43,11.85,0.03,3.51],[0.92,-2.1,5.42,-4.96,-3.77,-5.1,4.65,-0.25,-11.24,3.08,1.59,2.2,-1.53,-2.46,2.51,27.77],[0,-0.01,0,0,0,-0.01,0.01,0,0.01,0.01,0.01,0,0,0,0,0],[1.14,-10.03,-7.34,-10.15,-9.05,-7.9,-6.38,-2.59,-1.08,-12.09,-1.27,-12.8,-4.88,-6.15,-14.12,-0.02],[-19.98,18.69,17.72,23.8,13.6,1.2,-8.16,-0.79,14.61,9.2,2.96,-16.79,5.02,-4.89,6.8,-26.16],[-49.93,-40.07,14.75,19.78,14.79,-8.72,17.37,26.85,9.83,16.88,34.47,-3.39,-3.72,5.98,6.57,22.15],[-8.57,2.05,-0.23,5.15,3.08,1.21,-0.97,-0.96,-2.7,2.49,0.61,-2.72,-0.37,4.23,3.3,-0.74],[-30.08,-14.15,-5.4,-9.87,-2.38,-27.4,8.14,9.04,3.82,18.16,-9.96,-4.48,2.92,1.38,17.68,-7.85],[5.16,10.04,4.29,-2.95,13.12,2.47,8.65,-1.42,9.78,-4.73,4.31,-11.71,-2.48,3.32,-5.08,-1.73],[-11.47,13.78,-12.74,-2.43,7.02,-3.82,12.39,-1.66,8.46,-15.18,-3.44,37.78,34.36,5.9,21.13,1.78]]

# Create the pandas DataFrame 
df = pd.DataFrame(df, columns = ['A1', 'A2','A3','A4','B1', 'B2','B3','B4','C1', 'C2','C3','C4','D1', 'D2','D3','D4'])

kMeans = KMeans(n_clusters=2, random_state=0)
kMeans.fit(df)
clusters=kMeans.labels_

Thank you谢谢

You can't use a custom distance metric with sklearn so you have to use a different package.您不能将自定义距离度量与 sklearn 一起使用,因此您必须使用不同的 package。 One that offers a possibility to use custom metrics is pyclustering.提供使用自定义指标的可能性之一是 pyclustering。 Assuming that you have downloaded it, here's the examplary solution.假设您已经下载了它,这里是示例解决方案。

  1. Define your Pearson distance:定义您的皮尔逊距离:
from scipy import stats
def pearson_dist(x, y):
    r = stats.pearsonr(x, y)[0]
    return (1 - r) / 2
  1. Do the clustering with pyclustering:使用 pyclustering 进行聚类:
from pyclustering.cluster.kmeans import kmeans
from pyclustering.utils.metric import type_metric, distance_metric
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer

# change your df to numpy arr
sample = df.to_numpy()
# define a custom metric
metric = distance_metric(type_metric.USER_DEFINED, func=pearson_dist)
# carry out a km++ init
initial_centers = kmeans_plusplus_initializer(sample, 2).initialize()
# execute kmeans
kmeans_instance = kmeans(sample, initial_centers, metric=metric)
# run cluster analysis
kmeans_instance.process()
# get clusts
clusters = kmeans_instance.get_clusters()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何计算kmeans聚类中心之间的距离并在python中选择最小值? - How to calculate distance between cluster centres of kmeans and choose the minimum in python? 如何在python中使用KMeans对时间序列进行聚类 - How to cluster a time series using KMeans in python 你如何计算 Python 中 Pearson's r 的置信区间? - How do you compute the confidence interval for Pearson's r in Python? 在python中使用kmeans sklearn集群数据点 - Cluster datapoints using kmeans sklearn in python 如何使用Sklearn Kmeans聚类稀疏数据 - How to cluster sparse data using Sklearn Kmeans 如何使用 KMeans 对多维和未知数据进行聚类? - How to Cluster Multidimentional and Unkown Data using KMeans? 我如何使用连接到 Arduino UNO 的 Arduino 超声波传感器来使用 Pyfirmata 或 Python 通常测量距离? - How can I use an Arduino Ultrasonic Sensor connected to an Arduino UNO to measure distance using Pyfirmata or Python Generally? Python:Pearson的r - Python: Pearson's r 如何使用 python 测量图像中圆之间的中心距 - How to measure centre to centre distance between circles in an image using python 如何与 Python 中的 Split Pearson 7 function 配合使用? - How to do a fit with a Split Pearson 7 function in Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM