简体   繁体   English

如何使用索引搜索PYTHON加快嵌套for循环

[英]How to speed up a nested for loop with index search PYTHON

i get values from an orderbook as a list like this: 我从订单簿获取值,像这样的列表:

list1 = [...,'ethbtc', '0.077666', '10', '0.077680', '15',...]
------------------------^symbol-----^value-----^quantity-- ------------------------ ^符号----- ^值----- ^数量-

There are around 100 symbols in this list and 40 values for each symbol. 此列表中大约有100个符号,每个符号40个值。 They are always in the same order. 它们始终处于相同顺序。
I would like to find out at what maximum price my system buys in this moment if I pay say 100 % of my balance. 我想知道如果我支付我说的余额的100%,此时系统可以购买的最高价是多少。

So if I want to buy 11 ETH at 0.077666 the real price would be 0.077680 because there are only 10 ETH available at first price. 因此,如果我想以0.077666的价格购买11 ETH,那么实际价格将是0.077680,因为第一价格只有10 ETH可用。
I dont want to get the average because that would be to much at the moment 我不想获得平均值,因为此刻目前为止

My code has a nested for loop and loops through 2 lists: 我的代码有一个嵌套的for循环,并通过2个列表循环:

  1. coinlist = where all 100 symbols are listed like this symbollist = [ethbtc, eoseth,...] coinlist =其中列出了所有100个符号,例如: symbollist = [ethbtc, eoseth,...]
  2. list of indexes called a because the values and quantities are always at the same spot 索引列表称为a因为值和数量始终在同一位置
    a = ['1', '3', '5', ...]

My Code: 我的代码:

for symbolnow in symbollist:
sumlist = []
    for i in a:
        quantity = float(list1[list1.index(symbolnow) + (i+1)] if symbolnow in list1 else 0)
        sumlist.append(quantity)
        if sum(sumlist) > mycurrentbalance:
            maxvalue = float(list1[list1.index(symbolnow) + i] if symbolnow in list1 else -1)
            break
        else:
            maxvalue = -1

So what does this code do: 那么这段代码是做什么的:
1) loop through every symbol in the symbollist 1)遍历符号列表中的每个符号
2) for every found symbol i look for the available quantity 2)对于找到的每个符号,我都会寻找可用的数量
3) if my balance (ie 10 ETH) is smaller than qty the loop breaks 3)如果我的余额(即10 ETH)小于数量,循环中断
4) if not keeps searching and summarizing every qty in a sum list until there is enough. 4)如果没有,请继续搜索和汇总汇总列表中的每个数量,直到足够为止。

The code works as intended but not that fast. 该代码按预期工作,但速度不那么快。 As expected list1.index takes long to execute.. 不出所料list1.index需要很长时间才能执行。

Question
How would a faster code work. 更快的代码将如何工作。 Is a list comprehension better in this scenario or even regex? 在这种情况下甚至是正则表达式中,列表理解是否更好? Is my code very ugly? 我的代码很丑吗?

Thank you in advance! 先感谢您!

EDIT: 编辑:
to clarify the input and desired output, a sample: 为了阐明输入和所需的输出,请提供一个样本:

list1 = [...,'ethbtc', '0.077666', '1', '0.077680', '1.5', '0.077710', '3', '0.078200', '4',...]
mycurrentbalance = 5.5 <-- balance is in ETH mycurrentbalance = 5.5 <-余额以ETH为单位
every third entry in list1 is the quantity in ETH so in the list it would be ['1', '1.5', '3', '4'] list1中的每第三个条目都是以ETH为单位的数量,因此在列表中它将为['1', '1.5', '3', '4']

so if i want to sell all of my ETH (in this case 5.5) the max value would be '0.077710' 因此,如果我想卖出我所有的ETH(在这种情况下为5.5),最大值将为'0.077710'

list1 contains 100 symbols so before and after 'ethbtc' there are other values quantities and symbols list1包含100个符号,因此在'ethbtc'之前和之后还有其他值数量和符号

Preprocess list1 and store it in a dict. 预处理list1并将其存储在字典中。 This means you only iterate over list1 once instead of every time your inner loop runs. 这意味着您只需要遍历list1一次,而不是每次您的内部循环运行一次。

price_dict = {'ethbtc': ['0.077666', '10', '0.077680', '15'], 'btceth': [...], ...}

Instead of iterating over a , iterate over a range (Python 3) or xrange (Python 2). 而不是遍历a ,而是遍历range (Python 3)或xrange (Python 2)。 This will use an iterator instead of a list, and make your code more flexible. 这将使用迭代器而不是列表,并使您的代码更灵活。

range(0, len(price_dict[symbol]), 2)

In your case I think using a slice object would help with your 'a' loop, if there is a fixed interval. 在您的情况下,我认为如果有固定的间隔,使用slice对象将有助于您的'a'循环。 You can save a list slice to an object, as shown below (also, 1 or 2 other tips). 您可以将列表切片保存到对象,如下所示(还有1或2个其他技巧)。 I agree with user above that if you have a chance to pre-process that input data, then you really must. 我上面的用户同意,如果您有机会对输入数据进行预处理,那么您确实必须这样做。 I would recommend using the pandas library for that, because it is very fast, but dictionaries will also allow for hashing the values. 我建议为此使用pandas库,因为它非常快,但是字典也将允许对值进行哈希处理。

input_data = ['ethbtc', '0.0776666', '10', '0.077680', '15']  # Give your variables meaningful names

length = 20 # a variable to store how long a list of values is for a particular symbol.

for symbol in symbollist: # Use meaningful names if loops too
    start = input_data.index(symbol)  # break up longer lines
    # Some exception handling here
    indxs = slice(start: start+length:2) # python lets you create slice objects
    quantities = [float(number) for number in input_data[indxs]]

    if sum(quantities) > mycurrentbalance:
        # Whatever code here
        ....

In addition to the answer from user3080953, you have to preprocess your data not only because that will be more efficient, but because it will help you to handle the complexity. 除了user3080953的答案之外,您还必须预处理数据,不仅因为这将更加高效,而且还可以帮助您处理复杂性。 Here, you are doing two things at once: decoding your list and using the data. 在这里,您同时要做两件事:解码列表和使用数据。 First decode, then use. 首先解码,然后使用。

The target format should be, in my opinion: 我认为目标格式应为:

prices_and_quantities_by_symbol = {
    'ethbtc': {
        'prices':[0.077666, 0.077680, 0.077710, 0.078200], 
        'quantities':[1, 1.5, 3, 4]
    }, 
    'btceth': {
        ...
    }, 
...}

Now, you just have to do: 现在,您只需要执行以下操作:

for symbol, prices_and_quantities in prices_and_quantities_by_symbol.items(): # O(len(symbol_list))
    total = 0
    for p, q in zip(prices_and_quantities["prices"], prices_and_quantities["quantities"]): # O(len(quantities))
        total += q # the running sum
        if total >= my_current_balance:
            yield symbol, p # this will yield the symbol and the associated max_value
            break

How to get the data in the target format? 如何获取目标格式的数据? Just iterate over the list and, if you find a symbol, begin to store the values and quantities until the next symbol: 只需遍历列表,如果找到符号,就开始存储值和数量,直到下一个符号为止:

prices_and_quantities_by_symbol = {}
symbol_set = (symbol_list) # O(len(symbol_list))
for i, v in enumerate(list1): # O(len(list1))
    if v in symbol_set:  # amortized O(1) lookup
        current_prices = []
        current_quantities = []
        current_start = i+1
        prices_and_quantities_by_symbol[v] = {
            'prices':current_prices, 
            'quantities':current_quantities
        }
    else: # a value or a quantity
        (current_prices if (i-current_start)%2==0 else current_quantities).append(float(v))

You have a slight but interesting optimization, especially if your list of quantities/values are long. 您进行了轻微但有趣的优化,尤其是在数量/值列表较长的情况下。 Don't store the quantity but the running total of quantities: 不存储数量,但存储数量的总计:

prices_and_running_total_by_symbol = {
    'ethbtc': {
        'prices':[0.077666, 0.077680, 0.077710, 0.078200], 
        'running_total':[1, 2.5, 5.5, 9.5]
    }, 
    'btceth': {
        ...
    }, 
...}

Now, you can find very quickly your max_value, using bisect . 现在,您可以使用bisect快速找到您的max_value。 The code becomes more easy to understand, since bisect.bisect_left(rts, my_current_balance) will return the index of the first running total >= my_current_balance : 代码变得更容易理解,因为bisect.bisect_left(rts, my_current_balance)将返回第一个运行总计>= my_current_balance

for symbol, prices_and_running_totals in prices_and_running_totals_by_symbol.items(): # O(len(symbol_list))
    ps = prices_and_running_totals["prices"]
    rts = prices_and_running_totals["running_total"]
    i = bisect.bisect_left(rts, my_current_balance) # O(log(len(rts)))
    yield symbol, ps[i] # this will yield the symbol and the associated max_value

To build the running total, you have to handle differently the prices and the quantities: 要建立运行总计,您必须以不同的方式处理价格和数量:

# O(len(list1))
...
if v in symbol_set:  # amortized O(1) lookup*
    ...
elif (i-current_start)%2==0:
    current_prices.append(float(v))
else:
    current_running_totals.append((current_running_totals[-1] if current_running_totals else 0.0) + float(v))

Put everything into functions (or better, methods of a class): 将所有内容放入函数(或更好的类的方法)中:

prices_and_running_totals_by_symbol = process_data(list1)
for symbol, max_value in symbols_max_values(prices_and_running_totals_by_symbol, my_current_balance):
    print(symbol, max_value)

You can see how, by splitting the problem in two parts (decode and use), the code becomes faster and (in my opinion) easier to understand (I didn't put the comments, but they should be there). 通过将问题分为两个部分(解码和使用),您可以看到代码变得更快,并且(我认为)更易于理解(我没有发表评论,但应该在那里)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM