简体   繁体   English

为什么在切换循环时得到NaN?

[英]Why I am getting NaN when I toggle the loop?

I'm trying to write a k-means function in javascript. 我正在尝试在javascript中编写k-means函数。 And here is my code. 这是我的代码。

function kmeans(arrayToProcess,cluster_n){
    var pointDimension = arrayToProcess[0].length;
    var ClusterResult = new Array();
    var ClusterCenter = new Array();
    var oldClusterCenter = new Array();
    var changed=false;
    for(var i = 0;i<cluster_n;i++)
        ClusterCenter.push(arrayToProcess[randomInt(arrayToProcess.length-1)]);

    console.log(ClusterCenter);

    // do{
    for(var k=0;k<50;k++){//loop
        for(var i = 0; i<cluster_n; i++){
            ClusterResult[i] = new Array();
        }
        for(var i = 0; i<arrayToProcess.length; i++){
            //for every point element
            var oldDistance=-1;
            var newClusterNumber = 0;
            for(var j = 0; j<cluster_n; j++){
                //for every cluster
                var distance = Math.abs(computeDistanceBetween(arrayToProcess[i], ClusterCenter[j]));   
                if (oldDistance == -1){
                    oldDistance = distance;
                    newClusterNumber = j;
                }else if ( distance <= oldDistance ){
                    newClusterNumber = j;
                    oldDistance = distance;
                }
            }
            ClusterResult[newClusterNumber].push(arrayToProcess[i]);
        }
        oldClusterCenter = ClusterCenter;
        //compute new centroid
        for(var i = 0; i<cluster_n; i++){
            newCentroid = pinit(pointDimension);
            for(var j = 0; j<ClusterResult[i].length; j++){
                newCentroid = padd(ClusterResult[i][j], newCentroid);
            }
            ClusterCenter[i] = pdivide(newCentroid, ClusterResult[i].length);
        }

        changed=false;
        for(var i = 0; i<cluster_n; i++){
            if(!pequal(ClusterCenter[i],oldClusterCenter[i]))
                changed = true;
        }
    }//while (changed == true);

    return ClusterResult;
}


function computeDistanceBetween(a,b){
    var result = 0;
    for(var i = 0; i<a.length;i++) result += a[i] * b[i];
    return result;
}

function pinit(n){
    var result = new Array(n);
    for(var i=0;i<n;i++) result[i] = 0;
    return result;
}

function padd(a,b){
    var result = new Array(a.length);
    for(var i = 0; i<a.length;i++) result[i] = a[i] + b[i];
    return result;
}

function pdivide(a,d){
    var result = new Array(a.length);
    for(var i = 0; i<a.length;i++) result[i] = a[i] / d;
    return result;
}

function pequal(a,b){
    for(var i = 0; i<a.length;i++) 
        if(a[i] != b[i]) return false;
    return true;
}

function randomInt(max){
    return randomIntBetween(0,max);
}

function randomIntBetween(min,max){
    return Math.floor(Math.random() * (max - min + 1)) + min;
}

If I stop the for-loop(k<0), the console gives the right answer. 如果我停止for-loop(k <0),则控制台会给出正确的答案。 But if I start the for-loop(k<1),the array ClusterCenter will always has some NaN items. 但是,如果我启动for-loop(k <1),则数组ClusterCenter将始终具有一些NaN项。 How dose the NaN appear? NaN的剂量如何?

Edit: Further explanation: if the for-loop in the 14th line has been executed, the ClusterCenter above will give some NaN items.Why? 编辑:进一步的解释:如果已经执行了第14行中的for循环,则上面的ClusterCenter将给出一些NaN项。为什么?

Example input 输入示例

var testArray = new Array();
for(var i=0; i<100; i++) testArray.push([randomInt(-150,150),randomInt(-150,150)]);
kmeans(testArray,4);

the ClusterCenter above will give some NaN items.Why? 上面的ClusterCenter将给出一些NaN项。为什么?

Because you're diving zero by zero, which is not a number. 因为您要零零潜水,所以这不是一个数字。 This does happen for every empty cluster in the ClusterResult - it will create ClusterCenter[i] = pdivide(pinit(pointDimension), 0); 对于ClusterResult中的每个空群集,都确实会发生这种情况-它会创建ClusterCenter[i] = pdivide(pinit(pointDimension), 0); .

How to deal with empty clusters? 如何处理空集群? Possible strategies I could think of would be to make 0/0 = 0 , to choose a new random cluster center, or to drop the cluster alltogether ( cluster_n-- ). 我想到的可能策略是使0/0 = 0 ,选择一个新的随机聚类中心或将聚类全部放在一起( cluster_n-- )。

But why do you get so many empty clusters in the first place? 但是,为什么首先要得到这么多的空簇呢? Because your computeDistanceBetween function is seriously flawed. 因为您的computeDistanceBetween函数存在严重缺陷。 Every (non-0|0) point is distant from itself . 每个(非0 | 0)点都远离自身 Choose a more reasonable distance function, like euclidian distance. 选择一个更合理的距离函数,如欧氏距离。 It should always return a positive number, rendering the Math.abs in the loop superflouos. 它应始终返回一个正数,从而使Math.abs处于循环状态。


Some other points: 其他一些要点:

  • newCentroid misses a var statement and leaks into global scope newCentroid错过了var语句并泄漏到全局范围内
  • Your changed is flawed. 您的changed有缺陷。 When setting oldClusterCenter = ClusterCenter , both variables will hold the same array that is then mutated. 设置oldClusterCenter = ClusterCenter ,两个变量都将保存相同的数组 ,然后对其进行突变。 Not only is pequal(ClusterCenter[i],oldClusterCenter[i]) always true, but even ClusterCenter[i]===oldClusterCenter[i] because of oldClusterCenter === ClusterCenter . 不仅pequal(ClusterCenter[i],oldClusterCenter[i])始终为true,而且因为oldClusterCenter === ClusterCenter ,甚至ClusterCenter[i]===oldClusterCenter[i]

    To fix this, either make oldClusterCenter = ClusterCenter.slice() or introduce ClusterCenter = new Array(cluster_n); 要解决此问题,可以使oldClusterCenter = ClusterCenter.slice()或引入ClusterCenter = new Array(cluster_n); after the assignment. 分配后。

  • Your code for computing the nearest cluster could be simplified to 您用于计算最近群集的代码可以简化为

     var newClusterNumber = 0, oldDistance = computeDistanceBetween(arrayToProcess[i], ClusterCenter[0])); for (var j=1; j<cluster_n; j++) { var distance = computeDistanceBetween(arrayToProcess[i], ClusterCenter[j]); if (distance <= oldDistance) { newClusterNumber = j; oldDistance = distance; } } 

    or 要么

     var onewClusterNumber, ldDistance=Infinity; for (var j=0; j<cluster_n; j++) { var distance = computeDistanceBetween(arrayToProcess[i], ClusterCenter[j]); if (distance <= oldDistance) { newClusterNumber = j; oldDistance = distance; } } 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM