有没有办法更快地进行循环

Question

I want to be able to do an iteration of checking a condition of a value of a list that will only have numbers as entries.我希望能够进行一次迭代，检查一个只有数字作为条目的列表的值的条件。 If it passes the conditional test, then I want to add it to a new list.如果它通过了条件测试，那么我想将它添加到一个新列表中。 Unfortunately I don't think I can do a list comprehension due to the fact not all values will be added to same list.不幸的是，由于并非所有值都将添加到同一个列表中，因此我认为我无法进行列表理解。

I want to be able to do this:我希望能够做到这一点：

def sort(values: []):
    sum_0 = sum(values)
    len_0 = len(values)
    average_0 = sum_0 / len_0
    lesser_list_0 = []
    greater_list_0 = []
    for value in values:
        if value >= average_0:
            greater_list_0.append(value)
        else:
            lesser_list_0.append(value)

But without the annoyance of being slowed down by the for loop.但是没有被 for 循环减慢的烦恼。 Also, is there a faster way to add the value to the end of either list than using the append method?此外，有没有比使用 append 方法更快的方法将值添加到任一列表的末尾？

Answer 1

Since you need to read all values to perform this computation, then you will need "some kind of loop".由于您需要读取所有值来执行此计算，因此您将需要“某种循环”。 What you don't want to do is using a Python loop in numerical computations where you care for speed.您不想做的是在您关心速度的数值计算中使用 Python 循环。

I suggest you have a look into some specialized library for numerical computation.我建议你看看一些专门的数值计算库。 Particularly, take a look into numpy .特别是，看看numpy 。 You have functions to easily compute the average and numpy has a very power indexing where you can index an array with a single value, with an array of integers, with an array of booleans, etc.您具有轻松计算平均值的功能，并且 numpy 具有非常强大的索引，您可以使用单个值、整数数组、布尔数组等索引数组。

Check the code below, where we compare an array with a single scalar (the average) to get an array of booleans.检查下面的代码，我们将数组与单个标量（平均值）进行比较以获得布尔数组。 Then we can use this array of booleans to only get the values in the original array where the corresponding booleans are True.然后我们可以使用这个布尔数组来仅获取原始数组中对应的布尔值为 True 的值。 This will give you exactly what you want.这会给你你想要的。

import numpy as np


def separate_values(values: np.ndarray):
    average = np.mean(values)

    # This will gives an array of Boolean with the same dimension of `values`
    # and True only in places where the value is lower than the average
    mask1 = values < average
    mask2 = np.logical_not(mask1)  # We could also just write `values >= average`

    # We can use the boolean mask to index the original array.
    # This will gives us an array with the elements lower than the average
    lesser = values[mask1]
    # This will gives us an array with elements greater than or equal to the average
    greater = values[mask2]

    # Returns a tuple with both arrays
    return lesser, greater


if __name__ == '__main__':
    # A random array with 5 integers in the interval (0, 10]
    values = np.random.randint(0, 10, 5)

    lesser, greater = separate_values(values)

    print("Average:", np.mean(values))
    print("Values:", values)
    print("Values < average:", lesser)
    print("Values >= average:", greater)

You need to install numpy for this to work.您需要安装 numpy 才能正常工作。 It can be easily installed through pip, conda, etc..可以通过pip、conda等轻松安装。

Answer 2

yes you can use pandas and numpy libraries for these operations.是的，您可以将 pandas 和 numpy 库用于这些操作。 these libraries is optimized for these operations.这些库针对这些操作进行了优化。 they use c data types and concurrency and and multi processing and... .他们使用 c 数据类型和并发以及多处理和...。

https://pandas.pydata.org/pandas-docs/stable/10min.html https://pandas.pydata.org/pandas-docs/stable/10min.html

you must use slicing and subsetting.您必须使用切片和子集。 it works like this but not exatly you must refer to docs: specific_value = values_mean my_datafram[my_dataframe['values'] >= specific_value]它的工作原理是这样的，但您必须参考文档：specific_value = values_mean my_datafram[my_dataframe['values'] >= specific_value]

you can calculate mean very efficient wiht this: https://www.geeksforgeeks.org/python-pandas-dataframe-mean/你可以用这个计算平均值非常有效： https://www.geeksforgeeks.org/python-pandas-dataframe-mean/

Answer 3

List comprehensions are loops too and all you really save is a lookup of greater_list_0.append or lesser_list_0.append on each round.列表推导也是循环，您真正节省的只是在每一轮中查找greater_list_0.append或lesser_list_0.append 。 By the time you create two lists, the for loop is faster.当您创建两个列表时，for 循环会更快。 You can save a trivial amount of time by prestaging the two append methods you want.您可以通过预先安排您想要的两个 append 方法来节省少量时间。 For the 3 scenarios shown below, timing on my machine is对于下面显示的 3 个场景，我的机器上的时间是

for loop 1.0464496612548828
comprehensions 1.1907751560211182
less lookup 0.9023218154907227

And the test code is测试代码是

import random
import time

def sort(values: []):
    sum_0 = sum(values)
    len_0 = len(values)
    average_0 = sum_0 / len_0
    greater_list_0 = []
    lesser_list_0 = []
    for value in values:
        if value >= average_0:
            greater_list_0.append(value)
        else:
            lesser_list_0.append(value)

def sort2(values: []):
    sum_0 = sum(values)
    len_0 = len(values)
    average_0 = sum_0 / len_0
    greater_list_0 = [val for val in values if val >= average_0]
    lesser_list_0 = [val for val in values if val < average_0]

def sort_less_lookup(values: []):
    sum_0 = sum(values)
    len_0 = len(values)
    average_0 = sum_0 / len_0
    greater_list_0 = []
    lesser_list_0 = []
    g_append = greater_list_0.append
    l_append = lesser_list_0.append
    for value in values:
        if value >= average_0:
            g_append(value)
        else:
            l_append(value)

values = list(range(100000))
random.shuffle(values)

tries = 100
start = time.time()
for _ in range(tries):
    sort(values)
delta = time.time() - start
print('for loop', delta)

start = time.time()
for _ in range(tries):
    sort2(values)
delta = time.time() - start
print('comprehensions', delta)

start = time.time()
for _ in range(tries):
    sort_less_lookup(values)
delta = time.time() - start
print('less lookup', delta)

有没有办法更快地进行循环

问题描述

3 个解决方案

解决方案1
2 2020-05-23 18:22:24

解决方案2
0 2020-05-23 18:13:35

解决方案3
0 已采纳 2020-05-23 18:56:53

有没有办法更快地进行循环

问题描述

3 个解决方案

解决方案1 2 2020-05-23 18:22:24

解决方案2 0 2020-05-23 18:13:35

解决方案3 0 已采纳 2020-05-23 18:56:53

解决方案1
2 2020-05-23 18:22:24

解决方案2
0 2020-05-23 18:13:35

解决方案3
0 已采纳 2020-05-23 18:56:53