简体   繁体   English

在列表理解中使用 next

[英]using next in a list comprehension

I'm trying to do something quite simple, which I have probably overcomplicated:我正在尝试做一些非常简单的事情,但我可能过于复杂了:

This is the problem:这就是问题:

Let's say you are living in a controlled economy where there is a baker in town, and every day he bakes a certain number of loaves of bread.假设您生活在一个受控制的经济中,镇上有一个面包师,他每天烤一定数量的面包。 The people in the town queue up to buy a loaf of bread (you can only buy one loaf).镇上的人排队买一条面包(你只能买一条)。

There are more people in the queue than loaves of bread available.排队的人比提供的面包还多。 Everyone in the queue gets a ticket for the number that they are in the queue to prevent queue jumping, but they are the same order every single day (keeping it simple).队列中的每个人都会得到一张他们在队列中的号码的票,以防止队列跳跃,但他们每天都是相同的顺序(保持简单)。 The bread is ready at different times each day, and some people in the queue need to be at work, if the bread isn't ready before they have to leave for work, they leave the queue and the next person in line takes their place.面包每天在不同的时间准备好,队列中的一些人需要上班,如果在他们不得不离开上班之前面包没有准备好,他们会离开队列,排队的人代替他们. But they still have their original queue ticket.但他们仍然有他们原来的排队票。 The values in the original list are the number of hours before the person in the queue has to leave for work原始列表中的值是队列中的人必须离开去上班之前的小时数

I want to know what is the number on the last ticket given to the baker each day before he runs out of loaves of bread.我想知道他每天在面包用完之前发给他的最后一张票上的号码是多少。

I can get my existing code to work for relatively small numbers of people, but if there are millions of people, lots of days (planned economies plan for 5 years ahead), you get the picture.我可以让我现有的代码适用于相对较少的人,但如果有数百万人,很多天(计划经济计划提前 5 年),你就会明白。

def BakerQueue(loaves, people, bake_time):
    got_some_bread = []
    for b in bake_time:
        counter = 0
        for p in range(len(people)):
            if people[p] >= b:
                counter += 1
                if counter == loaves:
                    got_some_bread.append(p + 1)
                    counter = 0
                    break
                elif p == len(people) - 1:
                    got_some_bread.append(0)
                    break
            elif counter < loaves and p == len(people) - 1:
                got_some_bread.append(0)
                counter = 0
    return got_some_bread

You can use this to run the code: in this example, there are 3, 18 people in the list, and different bake times for each of the days in a week, so on the first day, ticket 1, 2, 3 get loaves, on the second day 2,3,4 get loaves, on the third day, 7, 9 and 15 get loaves.你可以用它来运行代码:在这个例子中,列表中有 3、18 个人,一周中每一天的烘焙时间不同,所以在第一天,票 1、2、3 得到面包, 第二天 2,3,4 得到面包,第三天,7, 9 和 15 得到面包。 I only care about who gets the last loaf on each day which is what the function is returning.我只关心每天谁拿到最后一条面包,这就是函数返回的内容。

BakerQueue(3, [1, 4, 4, 3, 1, 2, 6, 1, 9, 4, 4, 3, 1, 2, 6, 9, 4, 5, 8],[1, 2, 5, 4, 5, 4, 7])

This will return as expected这将按预期返回

[3, 4, 15, 7, 15, 7, 19]

Essentially, I want to prioritise the index level of a list and pop any values that are greater than another value本质上,我想优先考虑列表的索引级别并弹出任何大于另一个值的值

I have a list: my_list = [1, 4, 4, 3, 1, 2, 6] and I want to maintain it's index priority so I have enumerated both index and value into a new list:我有一个列表: my_list = [1, 4, 4, 3, 1, 2, 6]并且我想保持它的索引优先级,所以我将索引和值都枚举到了一个新列表中:

my_list_of_tuples = [(i, j) for i, j in enumerate(my_list)]

This gives me: [(0, 1), (1, 4), (2, 4), (3, 3), (4, 1), (5, 2), (6, 6)]这给了我: [(0, 1), (1, 4), (2, 4), (3, 3), (4, 1), (5, 2), (6, 6)]

I then convert this into a heap然后我将其转换为堆

heapq.heapify(my_list_of_tuples)

Now, I want to check whether the value at the top of the heap is greater than the iterated constant in a separate list I want to iterate through.现在,我想检查堆顶部的值是否大于我要迭代的单独列表中的迭代常量。 If it is, I want to pop it from the heap heapq.heappop(my_list_of_tuples)如果是,我想从堆中弹出heapq.heappop(my_list_of_tuples)

The code I thought to do this is as follows, but it doesn't work, so probably doesn't work, but how can I access the value at the top of the heap, I thought of writing something like this:我想这样做的代码如下,但它不起作用,所以可能不起作用,但是我如何访问堆顶部的值,我想写这样的东西:

    counter = 0
    while counter <= static_constant:
        if next([v[1] for v in my_list_of_tuples]) < iterated_constant:
            heapq.heappop(my_list_of_tuples)
        else:
            counter += 1

Hoping to get some help on how to deal with the list comprehension generator.希望得到一些关于如何处理列表理解生成器的帮助。 Thank you谢谢

I think I understood your problem.我想我明白你的问题了。

Problem description问题描述

Given:鉴于:

  • num_items - the number of available items num_items - 可用项目的数量
  • targets - a list of potential targets, each having a value targets - 潜在目标列表,每个目标都有一个值
  • threshold - a cutoff limit threshold - 一个截止限制

Task:任务:

  • Choose the first num_items elements of targets , whose values are above or equal to threshold .选择第一个num_items的元素targets ,其值高于或等于threshold
  • Return the array index of the last chosen element from targets (starting with 1 ), or 0 if not enough targets are available.返回从targets最后选择的元素的数组索引(从1开始),如果没有足够的目标可用,则返回0 (Odd decision, I would have gone with indices starting at 0 and return len(targets) if none found, but fine) (奇怪的决定,我会使用从0开始的索引并返回len(targets)如果没有找到,但很好)
  • Optimize for speed.优化速度。 targets and num_items are identical every time, threshold is the only value that changes. targetsnum_items都相同, threshold是唯一改变的值。

Example例子

num_items = 3
targets = [5,3,4,1,3,3,7,4]
threshold = 4

Chosen targets would be the ones at the positions [0,2,6] , with the values [5,4,7] , as those are the first 3 values that are above or equal to threshold .选择的目标将是位置[0,2,6]处的目标,其值为[5,4,7] ,因为它们是高于或等于threshold值的前3值。 We only search the index of the last one, which in this case would be 6 .我们只搜索最后一个的索引,在本例中为6


Approach方法

Your original idea was to iterate through all the people which is very fast if the threshold is very low, but becomes really slow if the threshold is higher, as we need to iterate through all the people until we find a candidate.您最初的想法是遍历所有人,如果阈值非常低,则速度非常快,但如果阈值较高,则速度会非常慢,因为我们需要遍历所有人,直到找到候选人。

I rewrote your original idea to iterate through all of them, as I wasn't able to understand your code:我重写了你最初的想法来遍历所有这些,因为我无法理解你的代码:

def choose_first_n(num_items, targets, threshold):
    for target_id, target in enumerate(targets):
        if target >= threshold:
            num_items -= 1
            if num_items == 0:
                return target_id + 1
    return 0

def baker_queue(num_loaves_per_day, people_max_waiting_time, required_baking_times):
    results = []
    for today_baking_time in required_baking_times:
        results.append(choose_first_n(num_loaves_per_day, people_max_waiting_time, today_baking_time))
    return results

print(baker_queue(3,
                  [1, 4, 4, 3, 1, 2, 6, 1, 9, 4, 4, 3, 1, 2, 6, 9, 4, 5, 8],
                  [1, 2, 5, 4, 5, 4, 7]))
# Returns: [3, 4, 15, 7, 15, 7, 19], as in the original code.
# Also, please provide expected return values in future, like I did here.

Using a heap is an interesting idea, but I don't think we benefit from that in any way.使用堆是一个有趣的想法,但我认为我们不会以任何方式从中受益。 Heaps are only really fast for item removal/insertion, which we don't do.堆只是在删除/插入项目时非常快,我们不这样做。 We just iterate over them.我们只是迭代它们。

The fastest way that I could think of is to pre-process the threshold list into something more efficient, as if, create an 'index' of the last item.我能想到的最快方法是将threshold列表预处理为更有效的方法,就像创建最后一项的“索引”一样。

Demonstration: We use our previous code, and look at the results based on the threshold value:演示:我们使用我们之前的代码,根据阈值看结果:

def choose_first_n(num_items, targets, threshold):
    for target_id, target in enumerate(targets):
        if target >= threshold:
            num_items -= 1
            if num_items == 0:
                return target_id + 1
    return 0

targets = [1, 4, 4, 3, 1, 2, 6, 1, 9, 4, 4, 3, 1, 2, 6, 9, 4, 5, 8]
num_items = 3

for threshold in range (10):
    result = choose_first_n(num_items, targets, threshold)
    print(f"Threshold: {threshold}, Result: {result}")
Threshold: 0, Result: 3
Threshold: 1, Result: 3
Threshold: 2, Result: 4
Threshold: 3, Result: 4
Threshold: 4, Result: 7
Threshold: 5, Result: 15
Threshold: 6, Result: 15
Threshold: 7, Result: 19
Threshold: 8, Result: 19
Threshold: 9, Result: 0

You can see that if the threshold goes up, the result goes up.您可以看到,如果阈值上升,结果就会上升。 There is a linear steadily increasing relationship between the threshold and the result.阈值和结果之间存在线性稳定增加的关系。

If we can compute the values at which the result changes, we can compute the result directly via a divide-and-conquer search, which is a LOT faster than iterating through the list.如果我们可以计算结果变化的值,我们可以通过分而治之的搜索直接计算结果,这比遍历列表快很多。 ( O(logn) instead of O(n) , in case you are familiar with Big-O notation) O(logn)而不是O(n) ,以防您熟悉 Big-O 表示法)

One thing to note here is that the last result is 0 , which brakes that scheme.这里要注意的一件事是最后一个结果是0 ,这会阻止该方案。 That is the reason why it is benefitial to let the indices start with 0 instead of 1 , and have the 'error' case be len(targets) instead of 0 .这就是为什么让索引从0而不是1开始是有益的,并且“错误”的情况是len(targets)而不是0

Preprocessing预处理

The hardest thing is the preprocessing to get to that mapping.最难的是获得该映射的预处理。

Let's look at it from the other way round.让我们换个角度来看。

For the sake of simplicity, let's say num_items is 3, and we have 10 targets.为简单起见,假设 num_items 为 3,我们有 10 个目标。 Will the chosen targets be within the first 5 targets?选择的目标是否在前 5 个目标之内?

The answer is: yes, IF at least 3 of the first 5 targets are above or equal to the threshold.答案是:是的,如果前 5 个目标中至少有 3 个高于或等于阈值。 Or in other words, the 3rd largest number in the list is the deciding factor.或者换句话说,列表中第三大的数字是决定因素。 If the threshold is above the 3rd largest number, the chosen targets will not only be within the first 5 targets.如果阈值高于第三大数字,则选择的目标将不仅在前 5 个目标内。

Therefore, for all items, we need to compute the 3rd largest number.因此,对于所有项目,我们需要计算第三大数字。 Funnily, this is actually where a heap WILL come in handy ;)有趣的是,这实际上是堆派上用场的地方;)

Implementation执行

import heapq
import bisect

def preprocess(targets, num_items):
    # our heap, will contain the first num_items smallest targets
    largest_targets_heap = []

    # Our first preprocessing result, will contain the
    # third large number between the first item and the current item,
    # for every item.
    third_largest_number_per_target = []

    # Compute the third largest previous value for every target
    for target in targets:
        heapq.heappush(largest_targets_heap, target)
        if len(largest_targets_heap) > num_items:
            heapq.heappop(largest_targets_heap)

        current_third_largest = largest_targets_heap[0]
        third_largest_number_per_target.append(current_third_largest)

    # We now have the third largest number for every target.
    # Now, consolidate that data into a lookup table, to prevent duplication.
    # Therefore, find the first occurrence of every number
    lookup_table_indices = []
    lookup_table_values = []
    current_value = third_largest_number_per_target[num_items - 1]

    # Push the (num_items-1)th value to account for the fact our heap wasn't filled up until the
    # first num_items were processed
    lookup_table_indices.append(num_items - 1)
    lookup_table_values.append(current_value)

    # Fill the rest of the lookup table
    for index, value in enumerate(third_largest_number_per_target):
        if index < num_items - 1:
            continue
        if value != current_value:
            lookup_table_indices.append(index)
            lookup_table_values.append(value)
            current_value = value

    # The lookup table we have, consisting of values, indices, a minimum and a maximum value
    lookup_table = (lookup_table_values, lookup_table_indices, num_items, len(targets))

    return lookup_table

def choose_first_n_preprocessed(lookup_table, threshold):
    (lookup_table_values, lookup_table_indices, min_value, max_value) = lookup_table

    # We need to find the first (value,index) pair in lookup table where value is larger or equal to threshold
    # We do this by using bisect, which is really fast. This is only possible because of our preprocessing.
    position = bisect.bisect_left(lookup_table_values, threshold)

    # If we didn't find a result in the preprocessed table, we return the max value, to indicate that the
    # threshold ist too high.
    if position >= len(lookup_table_indices):
        return max_value

    # Read the result from the table of incides
    value = lookup_table_indices[position]
    return value

def baker_queue(num_loaves_per_day, people_max_waiting_time, required_baking_times):
    # Create the preprocessed lookup table
    lookup_table = preprocess(people_max_waiting_time, num_loaves_per_day)

    # For every day, compute the result
    results = []
    for today_baking_time in required_baking_times:
        # Use our fast lookup based algorithm now
        result = choose_first_n_preprocessed(lookup_table, today_baking_time)
        
        # Convert indices back to starting with 1, and 0 in error case, as
        # the original format was
        if result == len(people_max_waiting_time):
            results.append(0)
        else:
            results.append(result+1)
    return results

print(baker_queue(3,
                  [1, 4, 4, 3, 1, 2, 6, 1, 9, 4, 4, 3, 1, 2, 6, 9, 4, 5, 8],
                  [1, 2, 5, 4, 5, 4, 7]))
# [3, 4, 15, 7, 15, 7, 19]

Theoretical Analysis理论分析

This should now be a LOT faster, especially for a large number of days, but also for a large number of people.这现在应该快很多,特别是对于大量的天,而且对于大量的人。

The complexity of the naive implementation was天真的实现的复杂性是

O(days * people)

The complexity of the preprocessed implementation is预处理实现的复杂度为

O(people * log(bread) + days * log(people))

This doesn't sound a lot different, but it is.这听起来没什么不同,但确实如此。 It basically says if the limiting factor are the people, it doesn't matter how many days, and if the limiting factor are the days, it doesn't matter how many people.它基本上是说,如果限制因素是人,那么多少天并不重要,如果限制因素是天数,那么有多少人并不重要。

Benchmarking Results基准测试结果

Setup was:设置是:

  • 900 bread per day每天900个面包
  • 10,000 people 10,000人
  • 10,000 days 10,000 天

Result:结果:

  • Naive: 2.13 seconds天真:2.13 秒
  • Preprocessed: 0.012 seconds预处理:0.012 秒

I then tried to push the algorithm so far that it also takes 2 seconds, and got those numbers:然后我尝试将算法推到这么远,它也需要 2 秒,并得到这些数字:

  • 90,000 bread per day每天 90,000 个面包
  • 1,000,000 people 1,000,000人
  • 1,000,000 days 1,000,000 天

I didn't run those numbers on the naive algorithm, but the math says it would have taken about 2,000,000 seconds, or 23 days.我没有在朴素算法上运行这些数字,但数学表明它需要大约 2,000,000 秒或 23 天。

Well that took a while, I hope it was worth it ;)好吧,这花了一段时间,我希望这是值得的;)

I think this was my biggest post yet, it was a really interesting task!我认为这是我迄今为止最大的帖子,这是一项非常有趣的任务!

I hope you appreciate it.我希望你能欣赏它。

Greetings你好

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM