简体   繁体   中英

find second largest in the list of numbers

I have to find the second largest number and largest number from the list by divide and conquer algorithm. The problem is that everything is right except the part that I use indices like a and b. Because it works faster. Cost cheaper. Do not need rewrite code or send other codes and approaches. Just help me please to fix it if u can.. Any helps any ideas welcome. Thanks

#!/usr/local/bin/python2.7

def two_max(arr,a,b):
      n = len(arr)
      if n==2: 
          if arr[0]<arr[1]: 
             return (arr[1], arr[0])
          else:
             return (arr[0], arr[1])

      (greatest_left, sec_greatest_left) = two_max(arr,a (a+b)/2)
      (greatest_right, sec_greatest_right) = two_max(arr,(a+b)/2,b)
      if greatest_left < greatest_right: 
          greatest = greatest_right
          if greatest_left < sec_greatest_left: 
              return (greatest, sec_greatest_left)
          else:
              return (greatest, greatest_left)
      else:
          greatest = greatest_left
          if greatest_right < sec_greatest_right: # Line 4
              return (greatest, sec_greatest_right)
          else:
              return (greatest, greatest_right) 

Why don't you do it that way:

>>> def getIlargest(arr, i):
        if (i <= len(arr) and i > 0):
            return sorted(arr)[-i]

>>> a = [1,3,51,4,6,23,53,2,532,5,2,6,7,5,4]      
>>> getIlargest(a, 2)
53

I took it one step further and tested 3 methods:

  1. Using counting sort - getIlargestVer2
  2. Using python sorted function - getIlargestVer1
  3. Using heap - heapIlargest as @abarnert suggested.

The results:

for arrays in sizes from 1 to ~5000 sorted is the best, for larger arrays the heapq.nlargest usage is the winner:

plot for arrays in sizes between [1*150, 55*150] :

在此输入图像描述

*Full scan between array in sizes of [1*150, 300*150]:*

在此输入图像描述

The code I used is the following, the 3 methods implementation is in setup string:

setup = """

import heapq, random

a = random.sample(xrange(1<<30), 150)
a = a * factor

class ILargestFunctions:

    # taken from [wiki][3] and was rewriting it.
    def counting_sort(self, array, maxval):
        m = maxval + 1
        count = {}
        for a in array:
            if count.get(a, None) is None:
                count[a] = 1
            else:
                count[a] += 1
        i = 0
        for key in count.keys():
            for c in range(count[key]):
                array[i] = key
                i += 1
        return array

    def getIlargestVer1(self, arr, i):
         if (i <= len(arr) and i > 0):
              return sorted(arr)[-i]

    def getIlargestVer2(self, arr, i):
         if (i <= len(arr) and i > 0):
              return self.counting_sort(arr, max(arr))[-i]

    def heapIlargest(self, arr, i):
        if (i <= len(arr) and i > 0):
            return heapq.nlargest(i,arr)

n = ILargestFunctions() 

"""

And the main line triggers the performance counting and plots the collected data is in:

import timeit
import numpy as np
import matplotlib.pyplot as plt

if __name__ == "__main__":
    results = {}
    r1 = []; r2 = []; r3 = [];
    x = np.arange(1,300,1)
    for i in xrange(1,300,1):
        print i
        factorStr = "factor = " + str(i) + ";"
        newSetupStr = factorStr + setup
        r1.append(timeit.timeit('n.getIlargestVer1(a, 100)', number=200, setup=newSetupStr))
        r2.append(timeit.timeit('n.getIlargestVer2(a, 100)', number=200, setup=newSetupStr))
        r3.append(timeit.timeit('n.heapIlargest(a, 100)', number=200, setup=newSetupStr))
        results[i] = (r1,r2,r3)
    p1 = plt.plot(x, r1, 'r', label = "getIlargestVer1")
    p2 = plt.plot(x, r2, 'b' , label = "getIlargestVer2")
    p3 = plt.plot(x, r3, 'g' , label = "heapIlargest")
    plt.legend(bbox_to_anchor=(1.05, 1), loc=1, borderaxespad=0.)
    plt.show()

The biggest problem is that you never get any closer to your recursive base case.

The base case is len(arr) == 2 . But every time you call yourself, you just pass arr as-is:

  (greatest_left, sec_greatest_left) = two_max(arr,a,(a+b)/2)
  (greatest_right, sec_greatest_right) = two_max(arr,(a+b)/2,b)

(Note that I'm guessing on the comma in the first one, because as you posted it, you're actually calling the number a as a function, which is unlikely to do anything useful…)

So, either your base case should take a and b into account, like this:

if b-a == 2:
    if arr[a]<arr[a+1]:
        return (arr[a+1], arr[a])
    else:
        return (arr[a], arr[a+1])

… or you should send a slice of arr instead of the whole thing—in which case you don't need a and b in the first place:

  (greatest_left, sec_greatest_left) = two_max(arr[:len(a)/2])
  (greatest_right, sec_greatest_right) = two_max(arr[len(a)/2:])

Either one will fix your first problem. Of course the function still doesn't work for most inputs. In fact, it only works if the length of the list is a power of two.

If that isn't a good enough hint for how to fix it: What happens if ba is 3? Obviously you can't split it into two halves, both of which are of size 2 or greater. So, you'll need to write another base case for ba == 1 , and return something that will make the rest of the algorithm work.

@0x90 has the right idea, but he got it reversed.

def find_i_largest_element(seq, i):
    if (i <= len(seq) and i > 0):
        s = sorted(seq, reverse=True)
        return s[i-1]

By the way, is this a homework assignment? If so, what's the whole idea behind the algorithm you have to use?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM