简体繁体中英

DBSCAN algorithm and clustering algorithm for data mining

原文 2011-04-16 11:18:13 6 2 algorithm/ data-mining/ cluster-analysis/ dbscan

How do you implement DBSCAN algorithm on categorical data (mushroom data set)?

And what is a one pass clustering algorithm?

Could you provide pseudo code for a one pass clustering algorithm?

2 answers

You can run DBSCAN with an arbitrary distance function without any changes to it. The indexing part will be more difficult, so you will likely only get O(n^2) complexity.

But if you look closely at DBSCAN, all it does is compute distances, compare them to a threshold, and count objects. This is a key strength of it, it can easily be applied to various kinds of data, all you need is to define a distance function and thresholds.

I doubt there is a one-pass version of DBSCAN, as it relies on pairwise distances. You can prune some of these computations (this is where the index comes into play), but essentially you need to compare every object to every other object, so it is in O(n log n) and not one-pass.

One-pass: I believe the original k-means was a one-pass algorithm. The first k objects are your initial means. For every new object, you choose the closes mean and update it (incrementally) with the new object. As long as you don't do another iteration over your data set, this was "one-pass". (The result will be even worse than lloyd-style k-means though).

Read the first k items and hold them. Compute the distances between them.

For each remaining item:

Find out which of the k items it is closest to, and the distance between these two items.
If this is longer than the closest distance between any two of the k items, you can swap the new item with one of these two and at least not decrease the closest distance between any two of the new k items. Do so so as to increase this distance as much as possible.

Suppose that the set of all items can be divided up into l <= k clusters so that the distance between any two points in the same cluster is smaller than the distance between any two points in different clusters. Then after running this algorithm, you will retain at least one point from each cluster.

How to implement DBSCAN clustering algorithm?

data mining algorithm that suggest for this situation

Which Data Mining Algorithm is the best?

Clustering Algorithm (in Python) for Data

DBSCAN algorithm (recursion logic)

Altering the DBSCAN Algorithm

Data Mining issue with the apriori algorithm in C#

Web Navigation Patterns Mining / Network Clustering Algorithm/Approach for web traffic clustering

Clustering algorithm for Voice clustering

How to adjust this DBSCAN algorithm python

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to implement DBSCAN clustering algorithm? data mining algorithm that suggest for this situation Which Data Mining Algorithm is the best? Clustering Algorithm (in Python) for Data DBSCAN algorithm (recursion logic) Altering the DBSCAN Algorithm Data Mining issue with the apriori algorithm in C# Web Navigation Patterns Mining / Network Clustering Algorithm/Approach for web traffic clustering Clustering algorithm for Voice clustering How to adjust this DBSCAN algorithm python

Related Tags

DBSCAN algorithm and clustering algorithm for data mining

Question

2 answers

solution1
1 2012-02-08 18:00:24

solution2
-2 2011-04-16 13:01:41

DBSCAN algorithm and clustering algorithm for data mining

Question

2 answers

solution1 1 2012-02-08 18:00:24

solution2 -2 2011-04-16 13:01:41

solution1
1 2012-02-08 18:00:24

solution2
-2 2011-04-16 13:01:41