简体   繁体   English

中位数维护算法-根据Int32或Int64,相同的实现产生不同的结果

[英]Median Maintenance Algorithm - Same implementation yields different results depending on Int32 or Int64

I found something interesting while doing a HW question. 我在做硬件问题时发现了一些有趣的东西。

The howework question asks to code the Median Maintenance algorithm. howework问题要求对中位数维护算法进行编码。

The formal statement is as follows: 正式声明如下:

The goal of this problem is to implement the "Median Maintenance" algorithm (covered in the Week 5 lecture on heap applications). 这个问题的目标是实现“中位数维护”算法(在第5周关于堆应用程序的讲座中介绍)。 The text file contains a list of the integers from 1 to 10000 in unsorted order; 文本文件包含从1到10000的整数列表(未排序)。 you should treat this as a stream of numbers, arriving one by one. 您应该将此视为一连串的数字,一一到达。 Letting x i denote the i th number of the file, the k th median m k is defined as the median of the numbers x 1 ,…,x k . 令x i表示文件的 i 数字,第k 中位数m k定义为数字x 1 ,…,x k的中值 (So, if k is odd, then m k is ((k+1)/2) th smallest number among x 1 ,…,x k ; if k is even, then m 1 is the (k/2) th smallest number among x 1 ,…,x k .) (因此,如果k为奇数,则m k为x 1 ,…,x k中的 ((k + 1)/ 2) 最小数;如果k为偶数,则m 1为第(k / 2) 最小数x 1 ,…,x k中的数字 。)

In order to get O(n) running time, this should be implemented using heaps obviously. 为了获得O(n)的运行时间,显然应该使用堆来实现。 Anyways, I coded this using Brute Force (deadline was too soon and needed an answer right away) ( O(n 2 )) with the following steps: 无论如何,我使用“蛮力”(截止日期太早,需要立即回答)( O(n 2 ))进行编码,并执行以下步骤:

  1. Read data in 读入数据
  2. Sort array 排序数组
  3. Find Median 查找中位数
  4. Add it to running time 将其添加到运行时间

I ran the algorithm through several test cases (with a known answer) and got the correct results, however when I was running the same algorithm on a larger data set I was getting the wrong answer. 我通过几个测试用例(具有已知答案)运行了该算法,并获得了正确的结果,但是当我在更大的数据集上运行相同的算法时,却得到了错误的答案。 I was doing all the operations using Int64 ro represent the data. 我正在使用Int64 ro代表数据进行所有操作。 Then I tried switching to Int32 and magically I got the correct answer which makes no sense to me. 然后我尝试切换到Int32,神奇地得到了正确的答案,这对我来说毫无意义。

The code is below, and it is also found here (the data is in the repo). 代码在下面,也可以在此处找到(数据位于存储库中)。 The algorithm starts to give erroneous results after the 3810 index: 在3810索引之后,该算法开始给出错误的结果:

    private static void Main(string[] args)
    {
        MedianMaintenance("Question2.txt");
    }

    private static void MedianMaintenance(string filename)
    {
        var txtData = File.ReadLines(filename).ToArray();
        var inputData32 = new List<Int32>();
        var medians32 = new List<Int32>();
        var sums32 = new List<Int32>();
        var inputData64 = new List<Int64>();
        var medians64 = new List<Int64>();
        var sums64 = new List<Int64>();
        var sum = 0;
        var sum64 = 0f;
        var i = 0;
        foreach (var s in txtData)
        {
            //Add to sorted list
            var intToAdd = Convert.ToInt32(s);

            inputData32.Add(intToAdd);
            inputData64.Add(Convert.ToInt64(s));

            //Compute sum
            var count = inputData32.Count;
            inputData32.Sort();
            inputData64.Sort();
            var index = 0;

            if (count%2 == 0)
            {
                //Even number of elements
                index = count/2 - 1;
            }
            else
            {
                //Number is odd
                index = ((count + 1)/2) - 1;
            }
            var val32 = Convert.ToInt32(inputData32[index]);
            var val64 = Convert.ToInt64(inputData64[index]);
            if (i > 3810)
            {
                var t = sum;
                var t1 = sum + val32;
            }
            medians32.Add(val32);
            medians64.Add(val64);
            //Debug.WriteLine("Median is {0}", val);
            sum += val32;
            sums32.Add(Convert.ToInt32(sum));
            sum64 += val64;
            sums64.Add(Convert.ToInt64(sum64));
            i++;
        }
        Console.WriteLine("Median Maintenance result is {0}", (sum).ToString("N"));
        Console.WriteLine("Median Maintenance result is {0}", (medians32.Sum()).ToString("N"));

        Console.WriteLine("Median Maintenance result is {0} - Int64", (sum64).ToString("N"));
        Console.WriteLine("Median Maintenance result is {0} - Int64", (medians64.Sum()).ToString("N"));
    }

What's more interesting is that the running sum (in the sum64 variable) yields a different result than summing all items in the list with LINQ's Sum() function. 更有意思的是,与使用LINQ的Sum()函数对列表中的所有项目求和相比,运行总和(在sum64变量中)产生的结果有所不同。

The results (the thirs one is the one that's wrong): 结果(第三个是错误的): 控制台应用程序结果

These are the computer details: 这些是计算机的详细信息: 电脑详情

I'll appreciate if someone can give me some insights on why is this happening. 如果有人可以给我一些见解,我将不胜感激。

Thanks, 谢谢,

0f is initializing a 32 bit float variable, you meant 0d or 0.0 to receive a 64 bit floating point. 0f正在初始化32位浮点变量,这意味着0d或0.0会接收64位浮点。

As for linq, you'll probably get better results if you use strongly typed lists. 至于linq,如果使用强类型列表,可能会得到更好的结果。

new List<int>()
new List<long>()

The first thing I notice is what the commenter did: var sum64 = 0f initializes sum64 as a float. 我注意到的第一件事是注释程序执行的操作: var sum64 = 0f将sum64初始化为浮点型。 As the median value of a collection of Int64s will itself be an Int64 (the specified rules don't use the mean between two midpoint values in a collection of even cardinality), you should instead declare this variable explicitly as a long . 由于Int64s集合的中位数本身就是Int64(指定的规则不使用偶数基数集合中两个中点值之间的均值),因此您应该改为将变量明确声明为long In fact, I would go ahead and replace all usages of var in this code example; 实际上,我将继续在此代码示例中替换var所有用法; the convenience of var is being lost here in causing type-related bugs. var的便利性在这里丢失了,导致了与类型相关的错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM