using next in a list comprehension

Question

I'm trying to do something quite simple, which I have probably overcomplicated:

This is the problem:

Let's say you are living in a controlled economy where there is a baker in town, and every day he bakes a certain number of loaves of bread. The people in the town queue up to buy a loaf of bread (you can only buy one loaf).

There are more people in the queue than loaves of bread available. Everyone in the queue gets a ticket for the number that they are in the queue to prevent queue jumping, but they are the same order every single day (keeping it simple). The bread is ready at different times each day, and some people in the queue need to be at work, if the bread isn't ready before they have to leave for work, they leave the queue and the next person in line takes their place. But they still have their original queue ticket. The values in the original list are the number of hours before the person in the queue has to leave for work

I want to know what is the number on the last ticket given to the baker each day before he runs out of loaves of bread.

I can get my existing code to work for relatively small numbers of people, but if there are millions of people, lots of days (planned economies plan for 5 years ahead), you get the picture.

def BakerQueue(loaves, people, bake_time):
    got_some_bread = []
    for b in bake_time:
        counter = 0
        for p in range(len(people)):
            if people[p] >= b:
                counter += 1
                if counter == loaves:
                    got_some_bread.append(p + 1)
                    counter = 0
                    break
                elif p == len(people) - 1:
                    got_some_bread.append(0)
                    break
            elif counter < loaves and p == len(people) - 1:
                got_some_bread.append(0)
                counter = 0
    return got_some_bread

You can use this to run the code: in this example, there are 3, 18 people in the list, and different bake times for each of the days in a week, so on the first day, ticket 1, 2, 3 get loaves, on the second day 2,3,4 get loaves, on the third day, 7, 9 and 15 get loaves. I only care about who gets the last loaf on each day which is what the function is returning.

BakerQueue(3, [1, 4, 4, 3, 1, 2, 6, 1, 9, 4, 4, 3, 1, 2, 6, 9, 4, 5, 8],[1, 2, 5, 4, 5, 4, 7])

This will return as expected

[3, 4, 15, 7, 15, 7, 19]

Essentially, I want to prioritise the index level of a list and pop any values that are greater than another value

I have a list: my_list = [1, 4, 4, 3, 1, 2, 6] and I want to maintain it's index priority so I have enumerated both index and value into a new list:

my_list_of_tuples = [(i, j) for i, j in enumerate(my_list)]

This gives me: [(0, 1), (1, 4), (2, 4), (3, 3), (4, 1), (5, 2), (6, 6)]

I then convert this into a heap

heapq.heapify(my_list_of_tuples)

Now, I want to check whether the value at the top of the heap is greater than the iterated constant in a separate list I want to iterate through. If it is, I want to pop it from the heap heapq.heappop(my_list_of_tuples)

The code I thought to do this is as follows, but it doesn't work, so probably doesn't work, but how can I access the value at the top of the heap, I thought of writing something like this:

    counter = 0
    while counter <= static_constant:
        if next([v[1] for v in my_list_of_tuples]) < iterated_constant:
            heapq.heappop(my_list_of_tuples)
        else:
            counter += 1

Hoping to get some help on how to deal with the list comprehension generator. Thank you

Answer 1

I think I understood your problem.

Problem description

Given:

num_items - the number of available items
targets - a list of potential targets, each having a value
threshold - a cutoff limit

Task:

Choose the first num_items elements of targets , whose values are above or equal to threshold .
Return the array index of the last chosen element from targets (starting with 1 ), or 0 if not enough targets are available. (Odd decision, I would have gone with indices starting at 0 and return len(targets) if none found, but fine)
Optimize for speed. targets and num_items are identical every time, threshold is the only value that changes.

Example

num_items = 3
targets = [5,3,4,1,3,3,7,4]
threshold = 4

Chosen targets would be the ones at the positions [0,2,6] , with the values [5,4,7] , as those are the first 3 values that are above or equal to threshold . We only search the index of the last one, which in this case would be 6 .

Approach

Your original idea was to iterate through all the people which is very fast if the threshold is very low, but becomes really slow if the threshold is higher, as we need to iterate through all the people until we find a candidate.

I rewrote your original idea to iterate through all of them, as I wasn't able to understand your code:

def choose_first_n(num_items, targets, threshold):
    for target_id, target in enumerate(targets):
        if target >= threshold:
            num_items -= 1
            if num_items == 0:
                return target_id + 1
    return 0

def baker_queue(num_loaves_per_day, people_max_waiting_time, required_baking_times):
    results = []
    for today_baking_time in required_baking_times:
        results.append(choose_first_n(num_loaves_per_day, people_max_waiting_time, today_baking_time))
    return results

print(baker_queue(3,
                  [1, 4, 4, 3, 1, 2, 6, 1, 9, 4, 4, 3, 1, 2, 6, 9, 4, 5, 8],
                  [1, 2, 5, 4, 5, 4, 7]))
# Returns: [3, 4, 15, 7, 15, 7, 19], as in the original code.
# Also, please provide expected return values in future, like I did here.

Using a heap is an interesting idea, but I don't think we benefit from that in any way. Heaps are only really fast for item removal/insertion, which we don't do. We just iterate over them.

The fastest way that I could think of is to pre-process the threshold list into something more efficient, as if, create an 'index' of the last item.

Demonstration: We use our previous code, and look at the results based on the threshold value:

def choose_first_n(num_items, targets, threshold):
    for target_id, target in enumerate(targets):
        if target >= threshold:
            num_items -= 1
            if num_items == 0:
                return target_id + 1
    return 0

targets = [1, 4, 4, 3, 1, 2, 6, 1, 9, 4, 4, 3, 1, 2, 6, 9, 4, 5, 8]
num_items = 3

for threshold in range (10):
    result = choose_first_n(num_items, targets, threshold)
    print(f"Threshold: {threshold}, Result: {result}")

Threshold: 0, Result: 3
Threshold: 1, Result: 3
Threshold: 2, Result: 4
Threshold: 3, Result: 4
Threshold: 4, Result: 7
Threshold: 5, Result: 15
Threshold: 6, Result: 15
Threshold: 7, Result: 19
Threshold: 8, Result: 19
Threshold: 9, Result: 0

You can see that if the threshold goes up, the result goes up. There is a linear steadily increasing relationship between the threshold and the result.

If we can compute the values at which the result changes, we can compute the result directly via a divide-and-conquer search, which is a LOT faster than iterating through the list. ( O(logn) instead of O(n) , in case you are familiar with Big-O notation)

One thing to note here is that the last result is 0 , which brakes that scheme. That is the reason why it is benefitial to let the indices start with 0 instead of 1 , and have the 'error' case be len(targets) instead of 0 .

Preprocessing

The hardest thing is the preprocessing to get to that mapping.

Let's look at it from the other way round.

For the sake of simplicity, let's say num_items is 3, and we have 10 targets. Will the chosen targets be within the first 5 targets?

The answer is: yes, IF at least 3 of the first 5 targets are above or equal to the threshold. Or in other words, the 3rd largest number in the list is the deciding factor. If the threshold is above the 3rd largest number, the chosen targets will not only be within the first 5 targets.

Therefore, for all items, we need to compute the 3rd largest number. Funnily, this is actually where a heap WILL come in handy ;)

Implementation

import heapq
import bisect

def preprocess(targets, num_items):
    # our heap, will contain the first num_items smallest targets
    largest_targets_heap = []

    # Our first preprocessing result, will contain the
    # third large number between the first item and the current item,
    # for every item.
    third_largest_number_per_target = []

    # Compute the third largest previous value for every target
    for target in targets:
        heapq.heappush(largest_targets_heap, target)
        if len(largest_targets_heap) > num_items:
            heapq.heappop(largest_targets_heap)

        current_third_largest = largest_targets_heap[0]
        third_largest_number_per_target.append(current_third_largest)

    # We now have the third largest number for every target.
    # Now, consolidate that data into a lookup table, to prevent duplication.
    # Therefore, find the first occurrence of every number
    lookup_table_indices = []
    lookup_table_values = []
    current_value = third_largest_number_per_target[num_items - 1]

    # Push the (num_items-1)th value to account for the fact our heap wasn't filled up until the
    # first num_items were processed
    lookup_table_indices.append(num_items - 1)
    lookup_table_values.append(current_value)

    # Fill the rest of the lookup table
    for index, value in enumerate(third_largest_number_per_target):
        if index < num_items - 1:
            continue
        if value != current_value:
            lookup_table_indices.append(index)
            lookup_table_values.append(value)
            current_value = value

    # The lookup table we have, consisting of values, indices, a minimum and a maximum value
    lookup_table = (lookup_table_values, lookup_table_indices, num_items, len(targets))

    return lookup_table

def choose_first_n_preprocessed(lookup_table, threshold):
    (lookup_table_values, lookup_table_indices, min_value, max_value) = lookup_table

    # We need to find the first (value,index) pair in lookup table where value is larger or equal to threshold
    # We do this by using bisect, which is really fast. This is only possible because of our preprocessing.
    position = bisect.bisect_left(lookup_table_values, threshold)

    # If we didn't find a result in the preprocessed table, we return the max value, to indicate that the
    # threshold ist too high.
    if position >= len(lookup_table_indices):
        return max_value

    # Read the result from the table of incides
    value = lookup_table_indices[position]
    return value

def baker_queue(num_loaves_per_day, people_max_waiting_time, required_baking_times):
    # Create the preprocessed lookup table
    lookup_table = preprocess(people_max_waiting_time, num_loaves_per_day)

    # For every day, compute the result
    results = []
    for today_baking_time in required_baking_times:
        # Use our fast lookup based algorithm now
        result = choose_first_n_preprocessed(lookup_table, today_baking_time)
        
        # Convert indices back to starting with 1, and 0 in error case, as
        # the original format was
        if result == len(people_max_waiting_time):
            results.append(0)
        else:
            results.append(result+1)
    return results

print(baker_queue(3,
                  [1, 4, 4, 3, 1, 2, 6, 1, 9, 4, 4, 3, 1, 2, 6, 9, 4, 5, 8],
                  [1, 2, 5, 4, 5, 4, 7]))
# [3, 4, 15, 7, 15, 7, 19]

Theoretical Analysis

This should now be a LOT faster, especially for a large number of days, but also for a large number of people.

The complexity of the naive implementation was

O(days * people)

The complexity of the preprocessed implementation is

O(people * log(bread) + days * log(people))

This doesn't sound a lot different, but it is. It basically says if the limiting factor are the people, it doesn't matter how many days, and if the limiting factor are the days, it doesn't matter how many people.

Benchmarking Results

Setup was:

900 bread per day
10,000 people
10,000 days

Result:

Naive: 2.13 seconds
Preprocessed: 0.012 seconds

I then tried to push the algorithm so far that it also takes 2 seconds, and got those numbers:

90,000 bread per day
1,000,000 people
1,000,000 days

I didn't run those numbers on the naive algorithm, but the math says it would have taken about 2,000,000 seconds, or 23 days.

Well that took a while, I hope it was worth it ;)

I think this was my biggest post yet, it was a really interesting task!

I hope you appreciate it.

Greetings

using next in a list comprehension

Question

1 answers

solution1
0 ACCPTED 2020-10-30 16:58:05

Problem description

Example

Approach

Preprocessing

Implementation

Theoretical Analysis

Benchmarking Results

using next in a list comprehension

Question

1 answers

solution1 0 ACCPTED 2020-10-30 16:58:05

Problem description

Example

Approach

Preprocessing

Implementation

Theoretical Analysis

Benchmarking Results

solution1
0 ACCPTED 2020-10-30 16:58:05