简体   繁体   中英

Median Maintenance Algorithm - Same implementation yields different results depending on Int32 or Int64

I found something interesting while doing a HW question.

The howework question asks to code the Median Maintenance algorithm.

The formal statement is as follows:

The goal of this problem is to implement the "Median Maintenance" algorithm (covered in the Week 5 lecture on heap applications). The text file contains a list of the integers from 1 to 10000 in unsorted order; you should treat this as a stream of numbers, arriving one by one. Letting x i denote the i th number of the file, the k th median m k is defined as the median of the numbers x 1 ,…,x k . (So, if k is odd, then m k is ((k+1)/2) th smallest number among x 1 ,…,x k ; if k is even, then m 1 is the (k/2) th smallest number among x 1 ,…,x k .)

In order to get O(n) running time, this should be implemented using heaps obviously. Anyways, I coded this using Brute Force (deadline was too soon and needed an answer right away) ( O(n 2 )) with the following steps:

  1. Read data in
  2. Sort array
  3. Find Median
  4. Add it to running time

I ran the algorithm through several test cases (with a known answer) and got the correct results, however when I was running the same algorithm on a larger data set I was getting the wrong answer. I was doing all the operations using Int64 ro represent the data. Then I tried switching to Int32 and magically I got the correct answer which makes no sense to me.

The code is below, and it is also found here (the data is in the repo). The algorithm starts to give erroneous results after the 3810 index:

    private static void Main(string[] args)
    {
        MedianMaintenance("Question2.txt");
    }

    private static void MedianMaintenance(string filename)
    {
        var txtData = File.ReadLines(filename).ToArray();
        var inputData32 = new List<Int32>();
        var medians32 = new List<Int32>();
        var sums32 = new List<Int32>();
        var inputData64 = new List<Int64>();
        var medians64 = new List<Int64>();
        var sums64 = new List<Int64>();
        var sum = 0;
        var sum64 = 0f;
        var i = 0;
        foreach (var s in txtData)
        {
            //Add to sorted list
            var intToAdd = Convert.ToInt32(s);

            inputData32.Add(intToAdd);
            inputData64.Add(Convert.ToInt64(s));

            //Compute sum
            var count = inputData32.Count;
            inputData32.Sort();
            inputData64.Sort();
            var index = 0;

            if (count%2 == 0)
            {
                //Even number of elements
                index = count/2 - 1;
            }
            else
            {
                //Number is odd
                index = ((count + 1)/2) - 1;
            }
            var val32 = Convert.ToInt32(inputData32[index]);
            var val64 = Convert.ToInt64(inputData64[index]);
            if (i > 3810)
            {
                var t = sum;
                var t1 = sum + val32;
            }
            medians32.Add(val32);
            medians64.Add(val64);
            //Debug.WriteLine("Median is {0}", val);
            sum += val32;
            sums32.Add(Convert.ToInt32(sum));
            sum64 += val64;
            sums64.Add(Convert.ToInt64(sum64));
            i++;
        }
        Console.WriteLine("Median Maintenance result is {0}", (sum).ToString("N"));
        Console.WriteLine("Median Maintenance result is {0}", (medians32.Sum()).ToString("N"));

        Console.WriteLine("Median Maintenance result is {0} - Int64", (sum64).ToString("N"));
        Console.WriteLine("Median Maintenance result is {0} - Int64", (medians64.Sum()).ToString("N"));
    }

What's more interesting is that the running sum (in the sum64 variable) yields a different result than summing all items in the list with LINQ's Sum() function.

The results (the thirs one is the one that's wrong): 控制台应用程序结果

These are the computer details: 电脑详情

I'll appreciate if someone can give me some insights on why is this happening.

Thanks,

0f is initializing a 32 bit float variable, you meant 0d or 0.0 to receive a 64 bit floating point.

As for linq, you'll probably get better results if you use strongly typed lists.

new List<int>()
new List<long>()

The first thing I notice is what the commenter did: var sum64 = 0f initializes sum64 as a float. As the median value of a collection of Int64s will itself be an Int64 (the specified rules don't use the mean between two midpoint values in a collection of even cardinality), you should instead declare this variable explicitly as a long . In fact, I would go ahead and replace all usages of var in this code example; the convenience of var is being lost here in causing type-related bugs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM