简体繁体中英

Generating high dimensional datasets with Scikit-Learn

原文 2015-03-19 15:32:01 8 1 python/ scikit-learn/ cluster-analysis/ mean-shift

I am working with the Mean Shift clustering algorithm, which is based on the kernel density estimate of a dataset. I would like to generate a large, high dimensional dataset and I thought the Scikit-Learn function make_blobs would be suitable. But when I try to generate a 1 million point, 8 dimensional dataset, I end up with almost every point being treated as a separate cluster.

I am generating the blobs with standard deviation 1, and then setting the bandwidth for the Mean Shift to the same value (I think this makes sense, right?). For two dimensional datasets this produced fine results, but for higher dimensions I think I'm running into the curse of dimensionality in that the distance between points becomes too big for meaningful clustering.

Does anyone have any tips/tricks on how to get a good high-dimensional dataset that is suitable for (something like) Mean Shift clustering? (or am I doing something wrong? (which is of course a good possibility))

1 answers

The standard deviation of the clusters isn't 1.

You have 8 dimensions, each of which has a stddev of 1, so you have a total standard deviation of sqrt(8) or something like that.

Kernel density estimation does not work well in high-dimensional data because of bandwidth problems.

Zip scikit-learn datasets

Can't import 'datasets' with scikit-learn

Looping scikit-learn machine learning datasets

scikit-learn multi dimensional features

How does one call external datasets into scikit-learn?

How to create my own datasets using in scikit-learn?

Scikit-learn: How to run KMeans on a one-dimensional array?

Fit one-dimensional data with scikit-learn to predict line

error while performing LDA dimensional reduction with scikit-learn

Using Scikit-learn KMeans to cluster multi-dimensional arrays

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Zip scikit-learn datasets Can't import 'datasets' with scikit-learn Looping scikit-learn machine learning datasets scikit-learn multi dimensional features How does one call external datasets into scikit-learn? How to create my own datasets using in scikit-learn? Scikit-learn: How to run KMeans on a one-dimensional array? Fit one-dimensional data with scikit-learn to predict line error while performing LDA dimensional reduction with scikit-learn Using Scikit-learn KMeans to cluster multi-dimensional arrays

Related Tags

Generating high dimensional datasets with Scikit-Learn

Question

1 answers

solution1 1 ACCPTED 2015-03-19 16:04:30

solution1
1 ACCPTED 2015-03-19 16:04:30