简体   繁体   English

扁平嵌套循环/降低复杂性 - 互补对计数算法

[英]Flattening nested loops / decreasing complexity - complementary pairs counting algorithm

I was recently trying to solve some task in Python and I have found the solution that seems to have the complexity of O(n log n) , but I believe it is very inefficient for some inputs (such as first parameter being 0 and pairs being very long list of zeros). 我最近试图解决Python中的一些任务,我发现解决方案似乎具有O(n log n)的复杂性,但我认为它对于某些输入(例如第一个参数为0pairs是非常低效)很长的零列表)。

It has also three levels of for loops. 它还有三个for循环级别。 I believe it can be optimized, but at the moment I cannot optimize it more, I am probably just missing something obvious ;) 我相信它可以进行优化,但目前我无法对其进行优化,我可能只是遗漏了一些明显的东西;)

So, basically, the problem is as follows: 所以,基本上,问题如下:

Given list of integers ( values ), the function needs to return the number of indexes' pairs that meet the following criteria: 给定整数( values )列表,函数需要返回满足以下条件的索引对的数量:

  • lets assume single index pair is a tuple like (index1, index2) , 假设单个索引对是一个元组(index1, index2)
  • then values[index1] == complementary_diff - values[index2] is true, 那么values[index1] == complementary_diff - values[index2]为真,

Example : If given a list like [1, 3, -4, 0, -3, 5] as values and 1 as complementary_diff , the function should return 4 (which is the length of the following list of indexes' pairs: [(0, 3), (2, 5), (3, 0), (5, 2)] ). 示例 :如果给出类似[1, 3, -4, 0, -3, 5]作为values ,将1作为complementary_diff ,则该函数应返回4 (这是以下索引列对的长度: [(0, 3), (2, 5), (3, 0), (5, 2)] )。

This is what I have so far, it should work perfectly most of the time, but - as I said - in some cases it could run very slowly, despite the approximation of its complexity O(n log n) (it looks like pessimistic complexity is O(n^2) ). 这就是我到目前为止,它应该在大多数情况下完美地工作,但是 - 正如我所说的 - 在某些情况下它可以非常缓慢地运行,尽管它的复杂度近似为O(n log n) (它看起来像悲观的复杂性是O(n ^ 2) )。

def complementary_pairs_number (complementary_diff, values):
    value_key = {} # dictionary storing indexes indexed by values
    for index, item in enumerate(values):
        try:
            value_key[item].append(index)
        except (KeyError,): # the item has not been found in value_key's keys
            value_key[item] = [index]
    key_pairs = set() # key pairs are unique by nature
    for pos_value in value_key: # iterate through keys of value_key dictionary
        sym_value = complementary_diff - pos_value
        if sym_value in value_key: # checks if the symmetric value has been found
            for i1 in value_key[pos_value]: # iterate through pos_values' indexes
                for i2 in value_key[sym_value]: # as above, through sym_values
                    # add indexes' pairs or ignore if already added to the set
                    key_pairs.add((i1, i2))
                    key_pairs.add((i2, i1))
    return len(key_pairs)

For the given example it behaves like that: 对于给定的示例,它的行为类似于:

>>> complementary_pairs_number(1, [1, 3, -4, 0, -3, 5])
4

If you see how the code could be "flattened" or "simplified", please let me know. 如果您看到代码如何“扁平化”或“简化”,请告诉我。

I am not sure if just checking for complementary_diff == 0 etc. is the best approach - if you think it is, please let me know. 我不确定是否只是检查complementary_diff == 0等是最好的方法 - 如果您认为是,请告诉我。

EDIT: I have corrected the example (thanks, unutbu!). 编辑:我已经纠正了这个例子(谢谢,unutbu!)。

I think this improves the complexity to O(n) : 我认为这提高了O(n)的复杂性:

  • value_key.setdefault(item,[]).append(index) is faster than using the try..except blocks. value_key.setdefault(item,[]).append(index)比使用try..except块更快。 It is also faster than using a collections.defaultdict(list) . 它也比使用collections.defaultdict(list)更快。 (I tested this with ipython %timeit.) (我用ipython%timeit测试了这个。)
  • The original code visits every solution twice. 原始代码访问每个解决方案两次。 For each pos_value in value_key , there is a unique sym_value associated with pos_value . 对于每个pos_valuevalue_key ,还有独特的sym_value关联pos_value There are solutions when sym_value is also in value_key . sym_value也在value_key时有解决方案。 But when we iterate over the keys in value_key , pos_value is eventually assigned to the value of sym_value , which make the code repeat the calculation it has already done. 但是当我们遍历value_key的键时, pos_value最终被赋值为sym_value的值,这使得代码重复它已经完成的计算。 So you can cut the work in half if you can stop pos_value from equaling the old sym_value . 因此,如果可以使pos_value等于旧的sym_value则可以将sym_value I implemented that with a seen = set() to keep track of seen sym_value s. 我使用seen = set()实现了它,以跟踪看到的sym_value
  • The code only cares about len(key_pairs) , not the key_pairs themselves. 代码只关心len(key_pairs) ,而不关心key_pairs本身。 So instead of keeping track of the pairs (with a set ), we can simply keep track of the count (with num_pairs ). 因此,我们可以简单地跟踪计数(使用num_pairs ),而不是跟踪对(有一set )。 So we can replace the two inner for-loops with 所以我们可以替换两个内部for循环

     num_pairs += 2*len(value_key[pos_value])*len(value_key[sym_value]) 

    or half that in the "unique diagonal" case, pos_value == sym_value . 或者在“唯一对角线”情况下的一半, pos_value == sym_value


def complementary_pairs_number(complementary_diff, values):
    value_key = {} # dictionary storing indexes indexed by values
    for index, item in enumerate(values):
        value_key.setdefault(item,[]).append(index)
    # print(value_key)
    num_pairs = 0
    seen = set()
    for pos_value in value_key: 
        if pos_value in seen: continue
        sym_value = complementary_diff - pos_value
        seen.add(sym_value)
        if sym_value in value_key: 
            # print(pos_value, sym_value, value_key[pos_value],value_key[sym_value])
            n = len(value_key[pos_value])*len(value_key[sym_value])
            if pos_value == sym_value:
                num_pairs += n
            else:
                num_pairs += 2*n
    return num_pairs

You may want to look into functional programming idioms, such as reduce, etc. 您可能希望研究函数式编程习语,例如reduce等。

Often times, nested array logic can be simplified by using functions like reduce, map, reject, etc. 通常,通过使用reduce,map,reject等函数可以简化嵌套数组逻辑。

For an example (in javascript) check out underscore js. 举个例子(在javascript中)查看下划线js。 I'm not terribly smart at Python, so I don't know which libraries they have available. 我对Python并不十分聪明,所以我不知道他们有哪些库。

I think (some or all of) these would help, but I'm not sure how I would prove it yet. 我认为(部分或全部)这些会有所帮助,但我不确定我将如何证明它。

1) Take values and reduce it to a distinct set of values, recording the count of each element (O(n)) 1)取值并将其减少到一组不同的值,记录每个元素的数量(O(n))

2) Sort the resulting array. 2)对结果数组进行排序。 (n log n) (n log n)

3) If you can allocate lots of memory, I guess you might be able to populate a sparse array with the values - so if the range of values is -100 : +100, allocate an array of [201] and any value that exists in the reduced set pops a one at the value index in the large sparse array. 3)如果你可以分配大量的内存,我想你可能能够用值填充稀疏数组 - 所以如果值的范围是-100:+100,则分配一个[201]的数组和任何存在的值在简化集中,在大型稀疏数组中的值索引处弹出一个。

4) Any value that you want to check if it meets your condition now has to look at the index in the sparse array according to the x - y relationship and see if a value exists there. 4)您想要检查它是否满足您的条件的任何值现在必须根据x-y关系查看稀疏数组中的索引,并查看是否存在值。

5) as unutbu pointed out, it's trivially symmetric, so if {a,b} is a pair, so is {b,a}. 5)正如unutbu指出的那样,它是平凡对称的,所以如果{a,b}是一对,那么{b,a}。

I think you can improve this by separating out the algebra part from the search and using smarter data structures. 我认为你可以通过将代数部分与搜索分离并使用更智能的数据结构来改进这一点。

  1. Go through the list and subtract from the complementary diff for each item in the list. 浏览列表并从列表中每个项目的互补差异中减去。

     resultlist[index] = complementary_diff - originallist[index] 

    You can use either a map or a simple loop. 您可以使用地图或简单循环。 -> Takes of O(n) time. - >需要O(n)时间。

  2. See if the number in the resulting list exists in the original list. 查看结果列表中的数字是否存在于原始列表中。

    • Here, with a naive list, you would actually get O(n^2) , because you can end up searching for the whole original list per item in the resulting list. 在这里,使用一个简单的列表,你实际上会得到O(n ^ 2) ,因为你最终可以在结果列表中的每个项目中搜索整个原始列表。

    • However, there are smarter ways to organize your data than this. 但是,有更聪明的方法来组织您的数据。 If you have the original list sorted , your search time reduces to O(nlogn + nlogn) = O(nlogn) , nlogn for the sort, and nlogn for the binary search per element. 如果您对原始列表进行了排序 ,则搜索时间将减少为O(nlogn + nlogn)= O(nlogn)nlogn用于排序, nlogn用于每个元素的二进制搜索。

    • If you wanted to be even smarter you can make your list in to a dictionary(or hash table) and then this step becomes O(n + n) = O(n) , n to build the dictionary and 1 * n to search each element in the dictionary. 如果你想要更聪明,你可以将列表放入字典(或哈希表) ,然后这一步变为O(n + n)= O(n)n为构建字典, 1 * n为每个搜索字典中的元素。 (*EDIT: * Since you cannot assume uniqueness of each value in the original list. You might want to keep count of how many times each value appears in the original list.) (*编辑:*因为您不能假设原始列表中每个值的唯一性。您可能想要计算每个值在原始列表中出现的次数。)

So with this now you get O(n) total runtime. 所以现在你得到O(n)总运行时间。

Using your example: 使用你的例子:

1, [1, 3, -4, 0, -3, 5],
  1. Generate the result list: 生成结果列表:

     >>> resultlist [0, -2, 5, 1, 4, -4]. 
  2. Now we search: 现在我们搜索:

    • Flatten out the original list into a dictionary. 将原始列表展平为字典。 I chose to use the original list's index as the value as that seems like a side data you're interested in. 我选择使用原始列表的索引作为值,因为这似乎是您感兴趣的副数据。

       >>> original_table {(1,0), (3,1), (-4,2), (0,3), (-3,4), (5,5)} 
    • For each element in the result list, search in the hash table and make the tuple: 对于结果列表中的每个元素,在哈希表中搜索并生成元组:

       (resultlist_index, original_table[resultlist[resultlist_index]]) 

      This should look like the example solution you had. 这看起来应该是您的示例解决方案。

  3. Now you just find the length of the resulting list of tuples. 现在,您只需找到结果元组列表的长度。

Now here's the code: 现在这里是代码:

example_diff = 1
example_values = [1, 3, -4, 0, -3, 5]
example2_diff = 1
example2_values = [1, 0, 1]

def complementary_pairs_number(complementary_diff, values):
    """
        Given an integer complement and a list of values count how many pairs
        of complementary pairs there are in the list.
    """
    print "Input:", complementary_diff, values
    # Step 1. Result list
    resultlist = [complementary_diff - value for value in values]
    print "Result List:", resultlist

    # Step 2. Flatten into dictionary
    original_table = {}
    for original_index in xrange(len(values)):
        if values[original_index] in original_table:
            original_table[values[original_index]].append(original_index)
        else:
            original_table[values[original_index]] = [original_index]
    print "Flattened dictionary:", original_table

    # Step 2.5 Search through dictionary and count up the resulting pairs.
    pair_count = 0
    for resultlist_index in xrange(len(resultlist)):
        if resultlist[resultlist_index] in original_table:
            pair_count += len(original_table[resultlist[resultlist_index]])
    print "Complementary Pair Count:", pair_count

    # (Optional) Step 2.5 Search through dictionary and create complementary pairs. Adds O(n^2) complexity.
    pairs = []
    for resultlist_index in xrange(len(resultlist)):
        if resultlist[resultlist_index] in original_table:
            pairs += [(resultlist_index, original_index) for original_index in
                original_table[resultlist[resultlist_index]]]
    print "Complementary Pair Indices:", pairs

    # Step 3
    return pair_count

if __name__ == "__main__":
    complementary_pairs_number(example_diff, example_values)
    complementary_pairs_number(example2_diff, example2_values)

Output: 输出:

$ python complementary.py
Input: 1 [1, 3, -4, 0, -3, 5]
Result List: [0, -2, 5, 1, 4, -4]
Flattened dictionary: {0: 3, 1: 0, 3: 1, 5: 5, -4: 2, -3: 4}
Complementary Pair Indices: [(0, 3), (2, 5), (3, 0), (5, 2)]
Input: 1 [1, 0, 1]
Result List: [0, 1, 0]
Flattened dictionary: {0: [1], 1: [0, 2]}
Complementary Pair Count: 4
Complementary Pair Indices: [(0, 1), (1, 0), (1, 2), (2, 1)]

Thanks! 谢谢!

Modified the solution provided by @unutbu: 修改了@unutbu提供的解决方案:

The problem can be reduced to comparing these 2 dictionaries: 将这两个词典进行比较可以减少问题:

  1. values

  2. pre-computed dictionary for (complementary_diff - values[i]) 预先计算的字典(complementary_diff - values [i])

     def complementary_pairs_number(complementary_diff, values): value_key = {} # dictionary storing indexes indexed by values for index, item in enumerate(values): value_key.setdefault(item,[]).append(index) answer_key = {} # dictionary storing indexes indexed by (complementary_diff - values) for index, item in enumerate(values): answer_key.setdefault((complementary_diff-item),[]).append(index) num_pairs = 0 print(value_key) print(answer_key) for pos_value in value_key: if pos_value in answer_key: num_pairs+=len(value_key[pos_value])*len(answer_key[pos_value]) return num_pairs 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM