在列表理解中使用 next

Question

我正在嘗試做一些非常簡單的事情，但我可能過於復雜了：

這就是問題：

假設您生活在一個受控制的經濟中，鎮上有一個面包師，他每天烤一定數量的面包。 鎮上的人排隊買一條面包（你只能買一條）。

排隊的人比提供的面包還多。 隊列中的每個人都會得到一張他們在隊列中的號碼的票，以防止隊列跳躍，但他們每天都是相同的順序（保持簡單）。 面包每天在不同的時間准備好，隊列中的一些人需要上班，如果在他們不得不離開上班之前面包沒有准備好，他們會離開隊列，排隊的人代替他們. 但他們仍然有他們原來的排隊票。 原始列表中的值是隊列中的人必須離開去上班之前的小時數

我想知道他每天在面包用完之前發給他的最后一張票上的號碼是多少。

我可以讓我現有的代碼適用於相對較少的人，但如果有數百萬人，很多天（計划經濟計划提前 5 年），你就會明白。

def BakerQueue(loaves, people, bake_time):
    got_some_bread = []
    for b in bake_time:
        counter = 0
        for p in range(len(people)):
            if people[p] >= b:
                counter += 1
                if counter == loaves:
                    got_some_bread.append(p + 1)
                    counter = 0
                    break
                elif p == len(people) - 1:
                    got_some_bread.append(0)
                    break
            elif counter < loaves and p == len(people) - 1:
                got_some_bread.append(0)
                counter = 0
    return got_some_bread

你可以用它來運行代碼：在這個例子中，列表中有 3、18 個人，一周中每一天的烘焙時間不同，所以在第一天，票 1、2、3 得到面包, 第二天 2,3,4 得到面包，第三天，7, 9 和 15 得到面包。 我只關心每天誰拿到最后一條面包，這就是函數返回的內容。

BakerQueue(3, [1, 4, 4, 3, 1, 2, 6, 1, 9, 4, 4, 3, 1, 2, 6, 9, 4, 5, 8],[1, 2, 5, 4, 5, 4, 7])

這將按預期返回

[3, 4, 15, 7, 15, 7, 19]

本質上，我想優先考慮列表的索引級別並彈出任何大於另一個值的值

我有一個列表： my_list = [1, 4, 4, 3, 1, 2, 6]並且我想保持它的索引優先級，所以我將索引和值都枚舉到了一個新列表中：

my_list_of_tuples = [(i, j) for i, j in enumerate(my_list)]

這給了我： [(0, 1), (1, 4), (2, 4), (3, 3), (4, 1), (5, 2), (6, 6)]

然后我將其轉換為堆

heapq.heapify(my_list_of_tuples)

現在，我想檢查堆頂部的值是否大於我要迭代的單獨列表中的迭代常量。 如果是，我想從堆中彈出heapq.heappop(my_list_of_tuples)

我想這樣做的代碼如下，但它不起作用，所以可能不起作用，但是我如何訪問堆頂部的值，我想寫這樣的東西：

    counter = 0
    while counter <= static_constant:
        if next([v[1] for v in my_list_of_tuples]) < iterated_constant:
            heapq.heappop(my_list_of_tuples)
        else:
            counter += 1

希望得到一些關於如何處理列表理解生成器的幫助。 謝謝

Answer 1

我想我明白你的問題了。

問題描述

鑒於：

num_items - 可用項目的數量
targets - 潛在目標列表，每個目標都有一個值
threshold - 一個截止限制

任務：

選擇第一個num_items的元素targets ，其值高於或等於threshold 。
返回從targets最后選擇的元素的數組索引（從1開始），如果沒有足夠的目標可用，則返回0 。 （奇怪的決定，我會使用從0開始的索引並返回len(targets)如果沒有找到，但很好）
優化速度。 targets和num_items都相同， threshold是唯一改變的值。

例子

num_items = 3
targets = [5,3,4,1,3,3,7,4]
threshold = 4

選擇的目標將是位置[0,2,6]處的目標，其值為[5,4,7] ，因為它們是高於或等於threshold值的前3值。 我們只搜索最后一個的索引，在本例中為6 。

方法

您最初的想法是遍歷所有人，如果閾值非常低，則速度非常快，但如果閾值較高，則速度會非常慢，因為我們需要遍歷所有人，直到找到候選人。

我重寫了你最初的想法來遍歷所有這些，因為我無法理解你的代碼：

def choose_first_n(num_items, targets, threshold):
    for target_id, target in enumerate(targets):
        if target >= threshold:
            num_items -= 1
            if num_items == 0:
                return target_id + 1
    return 0

def baker_queue(num_loaves_per_day, people_max_waiting_time, required_baking_times):
    results = []
    for today_baking_time in required_baking_times:
        results.append(choose_first_n(num_loaves_per_day, people_max_waiting_time, today_baking_time))
    return results

print(baker_queue(3,
                  [1, 4, 4, 3, 1, 2, 6, 1, 9, 4, 4, 3, 1, 2, 6, 9, 4, 5, 8],
                  [1, 2, 5, 4, 5, 4, 7]))
# Returns: [3, 4, 15, 7, 15, 7, 19], as in the original code.
# Also, please provide expected return values in future, like I did here.

使用堆是一個有趣的想法，但我認為我們不會以任何方式從中受益。 堆只是在刪除/插入項目時非常快，我們不這樣做。 我們只是迭代它們。

我能想到的最快方法是將threshold列表預處理為更有效的方法，就像創建最后一項的“索引”一樣。

演示：我們使用我們之前的代碼，根據閾值看結果：

def choose_first_n(num_items, targets, threshold):
    for target_id, target in enumerate(targets):
        if target >= threshold:
            num_items -= 1
            if num_items == 0:
                return target_id + 1
    return 0

targets = [1, 4, 4, 3, 1, 2, 6, 1, 9, 4, 4, 3, 1, 2, 6, 9, 4, 5, 8]
num_items = 3

for threshold in range (10):
    result = choose_first_n(num_items, targets, threshold)
    print(f"Threshold: {threshold}, Result: {result}")

Threshold: 0, Result: 3
Threshold: 1, Result: 3
Threshold: 2, Result: 4
Threshold: 3, Result: 4
Threshold: 4, Result: 7
Threshold: 5, Result: 15
Threshold: 6, Result: 15
Threshold: 7, Result: 19
Threshold: 8, Result: 19
Threshold: 9, Result: 0

您可以看到，如果閾值上升，結果就會上升。 閾值和結果之間存在線性穩定增加的關系。

如果我們可以計算結果變化的值，我們可以通過分而治之的搜索直接計算結果，這比遍歷列表快很多。 （ O(logn)而不是O(n) ，以防您熟悉 Big-O 表示法）

這里要注意的一件事是最后一個結果是0 ，這會阻止該方案。 這就是為什么讓索引從0而不是1開始是有益的，並且“錯誤”的情況是len(targets)而不是0 。

預處理

最難的是獲得該映射的預處理。

讓我們換個角度來看。

為簡單起見，假設 num_items 為 3，我們有 10 個目標。 選擇的目標是否在前 5 個目標之內？

答案是：是的，如果前 5 個目標中至少有 3 個高於或等於閾值。 或者換句話說，列表中第三大的數字是決定因素。 如果閾值高於第三大數字，則選擇的目標將不僅在前 5 個目標內。

因此，對於所有項目，我們需要計算第三大數字。 有趣的是，這實際上是堆派上用場的地方;)

執行

import heapq
import bisect

def preprocess(targets, num_items):
    # our heap, will contain the first num_items smallest targets
    largest_targets_heap = []

    # Our first preprocessing result, will contain the
    # third large number between the first item and the current item,
    # for every item.
    third_largest_number_per_target = []

    # Compute the third largest previous value for every target
    for target in targets:
        heapq.heappush(largest_targets_heap, target)
        if len(largest_targets_heap) > num_items:
            heapq.heappop(largest_targets_heap)

        current_third_largest = largest_targets_heap[0]
        third_largest_number_per_target.append(current_third_largest)

    # We now have the third largest number for every target.
    # Now, consolidate that data into a lookup table, to prevent duplication.
    # Therefore, find the first occurrence of every number
    lookup_table_indices = []
    lookup_table_values = []
    current_value = third_largest_number_per_target[num_items - 1]

    # Push the (num_items-1)th value to account for the fact our heap wasn't filled up until the
    # first num_items were processed
    lookup_table_indices.append(num_items - 1)
    lookup_table_values.append(current_value)

    # Fill the rest of the lookup table
    for index, value in enumerate(third_largest_number_per_target):
        if index < num_items - 1:
            continue
        if value != current_value:
            lookup_table_indices.append(index)
            lookup_table_values.append(value)
            current_value = value

    # The lookup table we have, consisting of values, indices, a minimum and a maximum value
    lookup_table = (lookup_table_values, lookup_table_indices, num_items, len(targets))

    return lookup_table

def choose_first_n_preprocessed(lookup_table, threshold):
    (lookup_table_values, lookup_table_indices, min_value, max_value) = lookup_table

    # We need to find the first (value,index) pair in lookup table where value is larger or equal to threshold
    # We do this by using bisect, which is really fast. This is only possible because of our preprocessing.
    position = bisect.bisect_left(lookup_table_values, threshold)

    # If we didn't find a result in the preprocessed table, we return the max value, to indicate that the
    # threshold ist too high.
    if position >= len(lookup_table_indices):
        return max_value

    # Read the result from the table of incides
    value = lookup_table_indices[position]
    return value

def baker_queue(num_loaves_per_day, people_max_waiting_time, required_baking_times):
    # Create the preprocessed lookup table
    lookup_table = preprocess(people_max_waiting_time, num_loaves_per_day)

    # For every day, compute the result
    results = []
    for today_baking_time in required_baking_times:
        # Use our fast lookup based algorithm now
        result = choose_first_n_preprocessed(lookup_table, today_baking_time)
        
        # Convert indices back to starting with 1, and 0 in error case, as
        # the original format was
        if result == len(people_max_waiting_time):
            results.append(0)
        else:
            results.append(result+1)
    return results

print(baker_queue(3,
                  [1, 4, 4, 3, 1, 2, 6, 1, 9, 4, 4, 3, 1, 2, 6, 9, 4, 5, 8],
                  [1, 2, 5, 4, 5, 4, 7]))
# [3, 4, 15, 7, 15, 7, 19]

理論分析

這現在應該快很多，特別是對於大量的天，而且對於大量的人。

天真的實現的復雜性是

O(days * people)

預處理實現的復雜度為

O(people * log(bread) + days * log(people))

這聽起來沒什么不同，但確實如此。 它基本上是說，如果限制因素是人，那么多少天並不重要，如果限制因素是天數，那么有多少人並不重要。

基准測試結果

設置是：

每天900個面包
10,000人
10,000 天

結果：

天真：2.13 秒
預處理：0.012 秒

然后我嘗試將算法推到這么遠，它也需要 2 秒，並得到這些數字：

每天 90,000 個面包
1,000,000人
1,000,000 天

我沒有在朴素算法上運行這些數字，但數學表明它需要大約 2,000,000 秒或 23 天。

好吧，這花了一段時間，我希望這是值得的;)

我認為這是我迄今為止最大的帖子，這是一項非常有趣的任務！

我希望你能欣賞它。

你好

在列表理解中使用 next

問題描述

1 個解決方案

解決方案1
0 已采納 2020-10-30 16:58:05

問題描述

例子

方法

預處理

執行

理論分析

基准測試結果

在列表理解中使用 next

問題描述

1 個解決方案

解決方案1 0 已采納 2020-10-30 16:58:05

問題描述

例子

方法

預處理

執行

理論分析

基准測試結果

解決方案1
0 已采納 2020-10-30 16:58:05