简体   繁体   中英

How can i generate three outlier points such that they are apparently far away from the normal data in python?

I am using make_moons dataset and I am trying to implement an outlier detection algorithm. That's why I want to generate 3 points which are away from normal data, and testify if they are outlier or not. These 3 points should be randomly selected from my data and should be far as possible from the normal data. My algorithm will compare the distance between that point with theresold value and finds if it is an outlier or not. I am aware of the other resources to do that, but my specific problem to do that, is my dataset. I could not find a way to fit the solutions to my dataset

Here is my code to define dataset and fit into K-Means(I have to use K-Means fitted data):

data = make_moons(n_samples=100,noise=0, random_state=0)
X,y=data
n_clusters=10
kmeans = KMeans(n_clusters = n_clusters,random_state=10)
kmeans.fit(X)
centroids = kmeans.cluster_centers_
labels = kmeans.labels_

Shortly, how can i find farthest 3 points in my data, to use it in outlier detection?

As stated in the comments, you should define a criteria to classify outliers. Either way, in the following code, I randomly selected three entries from X and multiplied them by 1,000, so surely that should make them outliers regardless of the definition you choose.

# Import libraries
import numpy as np
from sklearn.datasets import make_moons

# Create data
X, y = make_moons(100, random_state=123)

# Randomly select 3 row numbers from X
np.random.seed(5)
idx = np.random.randint(low=0, high=len(df[0]) + 1, size=3)

# Overwrite the data from the randomly selected rows
for i in idx:
    scaler = 1000 # Change this number to whatever you need
    X[i] = X[i] * scaler

Note: There is a small probability that idx will have duplicates. It won't happen with np.random.seed(5) , but if you choose another seed (or opt to not use one at all) and get duplicates, simply try another one or repeat until you don't get duplicates.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM