简体   繁体   English

如何计算列表的最小不公平总和

[英]how to calculate the minimum unfairness sum of a list

I have tried to summarize the problem statement something like this::我试图总结问题陈述如下:

Given n , k and an array(a list) arr where n = len(arr) and k is an integer in set (1, n) inclusive .给定nk和一个数组(一个列表) arr ,其中n = len(arr)并且kset (1, n) inclusiveinteger

For an array (or list) myList , The Unfairness Sum is defined as the sum of the absolute differences between all possible pairs (combinations with 2 elements each) in myList .用于阵列(或列表) myList ,该不公平的和被定义为所述sum的(每2种元素的组合)中的所有可能的对之间的绝对差myList

To explain : if mylist = [1, 2, 5, 5, 6] then Minimum unfairness sum or MUS.解释一下:如果mylist = [1, 2, 5, 5, 6]那么最小不公平总和或 MUS。 Please note that elements are considered unique by their index in list not their values请注意,元素在列表中的index被认为是唯一的,而不是它们的值

MUS = |1-2| + |1-5| + |1-5| + |1-6| + |2-5| + |2-5| + |2-6| + |5-5| + |5-6| + |5-6|

If you actually need to look at the problem statement, It's HERE如果您确实需要查看问题陈述,则在此处

My Objective我的目标

given n, k, arr (as described above), find the Minimum Unfairness Sum out of all of the unfairness sums of sub arrays possible with a constraint that each len(sub array) = k [which is a good thing to make our lives easy, I believe :) ]给定n, k, arr (如上所述),从所有可能的子数组不公平和中找到Minimum Unfairness Sum不公平和,约束条件是每个len(sub array) = k [这对我们的生活来说是件好事很简单,我相信:)]

what I have tried我试过的

well, there is a lot to be added in here, so I'll try to be as short as I can.好吧,这里有很多东西要添加,所以我会尽量简短。

My First approach was this where i used itertools.combinations to get all the possible combinations and statistics.variance to check its spread of data (yeah, I know I'm a mess).我的第一种方法是使用itertools.combinations来获取所有可能的组合,并使用statistics.variance来检查其spread of data (是的,我知道我一团糟)。
Before you see the code below, Do you think these variance and unfairness sum are perfectly related (i know they are strongly related) ie the sub array with minimum variance has to be the sub array with MUS ??在您看到下面的代码之前,您是否认为这些方差和不公平总和是完全相关的(我知道它们是强相关的),即minimum variance的子数组必须是具有MUS的子数组?

You only have to check the LetMeDoIt(n, k, arr) function.您只需要检查LetMeDoIt(n, k, arr)函数。 If you need MCVE , check the second code snippet below.如果您需要MCVE ,请检查下面的第二个代码片段。

from itertools import combinations as cmb
from statistics import variance as varn

def LetMeDoIt(n, k, arr):
    v = []
    s = []
    subs = [list(x) for x in list(cmb(arr, k))]  # getting all sub arrays from arr in a list

    i = 0
    for sub in subs:
        if i != 0:
            var = varn(sub)  # the variance thingy
            if float(var) < float(min(v)):
                v.remove(v[0])
                v.append(var)
                s.remove(s[0])
                s.append(sub)
            else:
                pass

        elif i == 0:
            var = varn(sub)
            v.append(var)
            s.append(sub)
            i = 1

    final = []
    f = list(cmb(s[0], 2))  # getting list of all pairs (after determining sub array with least MUS)
    
    for r in f:
        final.append(abs(r[0]-r[1]))  # calculating the MUS in my messy way

    return sum(final)

The above code works fine for n<30 but raised a MemoryError beyond that.上面的代码适用于n<30但引发了超出此范围的MemoryError In Python chat, Kevin suggested me to try generator which is memory efficient (it really is), but as generator also generates those combination on the fly as we iterate over them, it was supposed to take over 140 hours (:/) for n=50, k=8 as estimated.在 Python 聊天中,Kevin 建议我尝试使用memory efficient generator (确实如此),但由于生成器在我们iterate它们时也会动态生成这些组合,因此 n 应该花费 140 多个小时 (:/) =50,k=8 估计。

I posted the same as a question on SO HERE (you might wanna have a look to understand me properly - it has discussions and an answer by fusion which takes me to my second approach - a better one(i should say fusion's approach xD)).我在 SO HERE上发布了与问题相同的问题(您可能想看看以正确理解我 - 它有讨论和融合的答案,这将我带到我的第二种方法 - 更好的方法(我应该说融合的方法 xD)) .

Second Approach第二种方法

from itertools import combinations as cmb

def myvar(arr):   # a function to calculate variance
    l = len(arr)
    m = sum(arr)/l
    return sum((i-m)**2 for i in arr)/l

def LetMeDoIt(n, k, arr):
    sorted_list = sorted(arr)  # i think sorting the array makes it easy to get the sub array with MUS quickly
    variance = None
    min_variance_sub = None
    
    for i in range(n - k + 1):
        sub = sorted_list[i:i+k]
        var = myvar(sub)
        if variance is None or var<variance:
            variance = var
            min_variance_sub=sub
            
    final = []
    f = list(cmb(min_variance_sub, 2))  # again getting all possible pairs in my messy way

    for r in f:
        final.append(abs(r[0] - r[1]))

    return sum(final)

def MainApp():
    n = int(input())
    k = int(input())

    arr = list(int(input()) for _ in range(n))

    result = LetMeDoIt(n, k, arr)

    print(result)    

if __name__ == '__main__':
    MainApp()

This code works perfect for n up to 1000 (maybe more), but terminates due to time out (5 seconds is the limit on online judge :/ ) for n beyond 10000 (the biggest test case has n=100000 ).此代码适用于n up to 1000 (可能更多),但由于time out (5 秒是在线判断的限制 :/ )终止,n 超过10000 (最大的测试用例为n=100000 )。

===== ======

How would you approach this problem to take care of all the test cases in given time limits (5 sec) ?您将如何处理此问题以在给定的时间限制(5 秒)内处理所有测试用例? (problem was listed under algorithm & dynamic programming ) (问题列在algorithmdynamic programming

(for your references you can have a look on (对于您的参考,您可以查看

  1. successful submissions (py3, py2, C++, java) on this problem by other candidates - so that you can explain that approach for me and future visitors )其他候选人对此问题的成功提交(py3、py2、C++、java) -这样您就可以为我和未来的访问者解释该方法
  2. an editorial by the problem setter explaining how to approach the question问题制定者的社论解释了如何解决问题
  3. a solution code by problem setter himself (py2, C++).问题设置者自己的解决方案代码(py2,C++)。
  4. Input data (test cases) and expected output 输入数据(测试用例)和预期输出

Edit1 ::编辑 1 ::

For future visitors of this question, the conclusions I have till now are,对于这个问题的未来访问者,我到目前为止的结论是,
that variance and unfairness sum are not perfectly related (they are strongly related) which implies that among a lots of lists of integers, a list with minimum variance doesn't always have to be the list with minimum unfairness sum . varianceunfairness sum并不perfectly相关(它们是strongly相关的),这意味着在许多整数列表中, minimum variance的列表并不总是具有minimum unfairness sum的列表。 If you want to know why, I actually asked that as a separate question on math stack exchange HERE where one of the mathematicians proved it for me xD (and it's worth taking a look, 'cause it was unexpected)如果你想知道为什么,我实际上是作为一个关于数学堆栈交换的单独问题在这里提出的,其中一位数学家为我证明了它(值得一看,因为这是出乎意料的)

As far as the question is concerned overall, you can read answers by archer & Attersson below (still trying to figure out a naive approach to carry this out - it shouldn't be far by now though)就整个问题而言,您可以阅读下面的 archer & Attersson 的答案(仍在尝试找出一种天真的方法来执行此操作 - 不过现在应该不远了)


Thank you for any help or suggestions :)感谢您的任何帮助或建议:)

You must work on your list SORTED and check only sublists with consecutive elements.您必须处理您的列表 SORTED 并仅检查具有连续元素的子列表。 This is because BY DEFAULT, any sublist that includes at least one element that is not consecutive, will have higher unfairness sum.这是因为默认情况下,任何包含至少一个不连续元素的子列表将具有更高的不公平性总和。

For example if the list is例如,如果列表是

[1,3,7,10,20,35,100,250,2000,5000] and you want to check for sublists with length 3, then solution must be one of [1,3,7] [3,7,10] [7,10,20] etc Any other sublist eg [1,3,10] will have higher unfairness sum because 10>7 therefore all its differences with rest of elements will be larger than 7 The same for [1,7,10] (non consecutive on the left side) as 1<3 [1,3,7,10,20,35,100,250,2000,5000] 并且您想检查长度为 3 的子列表,则解决方案必须是 [1,3,7] [3,7,10] [7] 之一,10,20] 等任何其他子列表,例如 [1,3,10] 将具有更高的不公平总和,因为 10>7 因此它与其余元素的所有差异都将大于 7 与 [1,7,10] 相同(左侧不连续)为 1<3

Given that, you only have to check for consecutive sublists of length k which reduces the execution time significantly鉴于此,您只需检查长度为 k 的连续子列表,这显着减少了执行时间

Regarding coding, something like this should work:关于编码,这样的事情应该有效:

def myvar(array):
    return sum([abs(i[0]-i[1]) for i in itertools.combinations(array,2)])  
  
def minsum(n, k, arr):
        res=1000000000000000000000 #alternatively make it equal with first subarray
        for i in range(n-k):
            res=min(res, myvar(l[i:i+k]))
        return res
    

I see this question still has no complete answer.我看到这个问题仍然没有完整的答案。 I will write a track of a correct algorithm which will pass the judge.我将编写一个正确算法的轨道,该算法将通过判断。 I will not write the code in order to respect the purpose of the Hackerrank challenge.为了尊重 Hackerrank 挑战的目的,我不会编写代码。 Since we have working solutions.因为我们有可行的解决方案。

  1. The original array must be sorted.必须对原始数组进行排序。 This has a complexity of O(NlogN)这具有 O(NlogN) 的复杂度

  2. At this point you can check consecutive sub arrays as non-consecutive ones will result in a worse (or equal, but not better) "unfairness sum".在这一点上,您可以检查连续的子数组,因为非连续的子数组会导致更差(或相等,但不是更好)的“不公平总和”。 This is also explained in archer's answer这在弓箭手的回答中也有解释

  3. The last check passage, to find the minimum "unfairness sum" can be done in O(N).最后一个检查通道,以找到最小的“不公平和”可以在 O(N) 中完成。 You need to calculate the US for every consecutive k-long subarray.您需要为每个连续的 k 长子阵列计算 US。 The mistake is recalculating this for every step, done in O(k), which brings the complexity of this passage to O(k*N).错误是为每一步都重新计算,在 O(k) 中完成,这使这段话的复杂性变为 O(k*N)。 It can be done in O(1) as the editorial you posted shows, including mathematic formulae.正如您发布的社论所示,它可以在 O(1) 中完成,包括数学公式。 It requires a previous initialization of a cumulative array after step 1 (done in O(N) with space complexity O(N) too).它需要在步骤 1 之后对累积数组进行先前的初始化(在 O(N) 中完成,空间复杂度也为 O(N))。

It works but terminates due to time out for n<=10000.它可以工作,但由于 n<=10000 超时而终止。

(from comments on archer's question) (来自对弓箭手问题的评论)

To explain step 3, think about k = 100. You are scrolling the N-long array and the first iteration, you must calculate the US for the sub array from element 0 to 99 as usual, requiring 100 passages.要解释第 3 步,请考虑 k = 100。您正在滚动 N 长数组和第一次迭代,您必须像往常一样计算从元素 0 到 99 的子数组的 US,需要 100 个段落。 The next step needs you to calculate the same for a sub array that only differs from the previous by 1 element 1 to 100. Then 2 to 101, etc. If it helps, think of it like a snake.下一步需要您为与前一个元素仅相差 1 个元素的子数组计算相同的 1 到 100。然后是 2 到 101,等等。如果有帮助,可以把它想象成一条蛇。 One block is removed and one is added.一个块被移除,一个被添加。 There is no need to perform the whole O(k) scrolling.不需要执行整个 O(k) 滚动。 Just figure the maths as explained in the editorial and you will do it in O(1).只需按照社论中的解释计算数学,您将在 O(1) 中完成。

So the final complexity will asymptotically be O(NlogN) due to the first sort.因此,由于第一种排序,最终的复杂度将渐近为 O(NlogN)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM