简体   繁体   English

Python中的算法帮助,找到对(x,y),其中y / x> const

[英]Algorithmic help in Python, find pair (x,y) where y/x > const

I'm building a rather huge real-time odds system, and my bottleneck right now is the actual computation. 我正在构建一个相当庞大的实时赔率系统,而现在的瓶颈是实际计算。 I have a huge amount of sorted lists, and for each list, I need to find each pair (x,y) where (y/x) > const. 我有大量的排序列表,对于每个列表,我需要找到每对(x,y),其中(y / x)> const。

This is what I'm currently doing; 这就是我目前正在做的事情;

for f in reversed(xrange(1, len(odds))):
    found = False
    for s in xrange(0, f):
        try:
            edge = odds[s]/odds[f]
        except ZeroDivisionError:
            continue
        if edge > const:
            found = True
            yield odds[f], odds[s]
        else:
            break
    if not found:
        break

The plan being stop whenever I'm certain there are no more pairs. 每当我确定不再有对时,该计划就会停止。 However, I'm doing this for an average of 40 lists each cycle, and I'm in desperate need of shortening the cycletime. 但是,我平均每个周期要执行40个列表,因此迫切需要缩短周期时间。 I'm curious about using numpy and see whether than can help me. 我对使用numpy感到好奇,看看是否能帮到我。

The length of each inidividual list is < 50. 每个个体列表的长度小于50。

Thanks for any help! 谢谢你的帮助!

EDIT This is an examplelist with structure 编辑这是带有结构的示例列表

(_ , odds1, odds2, odds3, _, _) (_ means not used):
[(260, Decimal('1.45'), Decimal('5.5'), Decimal('4'), 0, 2666298), (35549, Decimal('1.62'), Decimal('4.5'), Decimal('3.5'), 0, 2666298), (35551, Decimal('1.666'), Decimal('4.333'), Decimal('3.6'), 0, 2666298), (35552, Decimal('1.6'), Decimal('3.6'), Decimal('3.35'), 0, 2666298), (35553, Decimal('1.6'), Decimal('3.6'), Decimal('3.35'), 0, 2666298), (54453, Decimal('1.65'), Decimal('4.2'), Decimal('3.6'), 0, 2666298), (56234, Decimal('1.571'), Decimal('4.65'), Decimal('3.9'), 0, 2666298), (56911, Decimal('1.7'), Decimal('4.2'), Decimal('3.15'), 0, 2666298)]

I split this list into 3 lists, odds1_list, odds2_list, odds3_list and do computations on them. 我将此列表分为3个列表,odds1_list,odds2_list,odds3_list并对其进行计算。 An example of odds1: 例如odds1:

[Decimal('1.7'), Decimal('1.666'), Decimal('1.65'), Decimal('1.62'), Decimal('1.6'), Decimal('1.6'), Decimal('1.571'), Decimal('1.45')]

Then I need to identify all pairs (x,y) in this list where (y/x > const) 然后,我需要确定此列表中的所有对(x,y),其中(y / x> const)

If you have some list odds you can do 如果您有一些odds ,可以做

from itertools import product
list(filter(lambda i: i[0] != 0 and i[1]/i[0] > 2, product(odds,repeat=2)))

For example 例如

odds = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

Produces 产生

[(1.0, 3.0), (1.0, 4.0), (1.0, 5.0), (1.0, 6.0), (1.0, 7.0), (1.0, 8.0), (1.0, 9.0),
 (2.0, 5.0), (2.0, 6.0), (2.0, 7.0), (2.0, 8.0), (2.0, 9.0),
 (3.0, 7.0), (3.0, 8.0), (3.0, 9.0),
 (4.0, 9.0)]

If the list is sorted, then for each x you can just search the list for the first occurrence of const*x, and all items after that match: 如果列表已排序,那么对于每个x,您只需在列表中搜索const * x的第一个匹配项,然后搜索匹配的所有项:

import numpy

odds = numpy.arange(10.)
const = 2.5

for x in odds:
    idx = numpy.searchsorted(odds, const*x, side='right')
    for y in odds[idx:]:
        print (x,y)

Running gives 跑步给

(0.0, 1.0)
(0.0, 2.0)
(0.0, 3.0)
(0.0, 4.0)
(0.0, 5.0)
(0.0, 6.0)
(0.0, 7.0)
(0.0, 8.0)
(0.0, 9.0)
(1.0, 3.0)
(1.0, 4.0)
(1.0, 5.0)
(1.0, 6.0)
(1.0, 7.0)
(1.0, 8.0)
(1.0, 9.0)
(2.0, 6.0)
(2.0, 7.0)
(2.0, 8.0)
(2.0, 9.0)
(3.0, 8.0)
(3.0, 9.0)

If I get you right: Having a list[start, end), you want to find all indices y where list[y] > constant * list[x] for each index x in the sorted list of numbers. 如果我说对了:拥有一个列表[start,end),您想找到所有索引y,其中list [y]> constant * list [x]为数字排序列表中的每个索引x。

An algorithm might be: 算法可能是:

Set the index y to the beginning of the list.
For each index x:
     Set limit := constant * list[x]
     Binary search an index y' in the range [y, end) where list[y'] > limit
     If the index y' is in the range [y, end):
         Add all pairs list[x], list[y''] where y'' is in the range [y', end]
            to the result set.
         Set y = y'
     Otherwise:
         No further results exist.

An implementation in c++ (you accidental tagged it that way): c ++中的一个实现(您偶然用这种方式标记了它):

#include <iostream>
#include <vector>
#include <algorithm>

int main ()
{
    const unsigned constant = 2;
    std::vector<unsigned> v = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
    auto y = v.begin();
    for(auto x = v.begin(); x < v.end() && y != v.end(); ++x) {
        std::cout << "x = " << *x << ", y: ";
        unsigned limit = constant * (*x);
        y = std::lower_bound(y, v.end(), limit);
        if(y != v.end()) {
            if(*y == limit) ++y;
            for(auto r = y; r < v.end(); ++r)
                std::cout << *r << " ";
        }
        std::cout << "\n";
    }
}

Here's an alternative that uses numpy and its broadcasting ability: 这是使用numpy及其广播功能的替代方法:

def find_pairs(odds, const):
    with np.errstate(divide='ignore'):
        pairs = odds[np.column_stack(np.where(odds / odds[:, None] > const))]
    return pairs

In theory, the time complexity is O(n**2) (where n is the length of odds ), but you say n is at most 50, which is small enough that the theoretical complexity might not matter. 从理论上讲,时间复杂度为O(n ** 2)(其中n为odds的长度),但是您说n最多为50,这足够小,因此理论上的复杂度可能无关紧要。

Here's a full script that includes some of the other answers (so far): 这是一个完整的脚本,其中包含一些其他答案(到目前为止):

from itertools import product
import numpy as np


def find_pairs(odds, const):
    with np.errstate(divide='ignore'):
        pairs = odds[np.column_stack(np.where(odds / odds[:, None] > const))]
    return pairs


def dursi(odds, const):
    for x in odds:
        idx = np.searchsorted(odds, const*x, side='right')
        for y in odds[idx:]:
            yield (x,y)


def cyber(odds, const):
    return list(filter(lambda i: i[1]/i[0] > const, product(odds, repeat=2)))

And here's a timing comparison, using a numpy array with 50 elements: 这是时序比较,使用具有50个元素的numpy数组:

In [122]: const = 1.25

In [118]: odds = np.sort(1 + np.random.rand(50))

In [119]: %timeit find_pairs(odds, const)
10000 loops, best of 3: 34.9 µs per loop

In [120]: %timeit list(dursi(odds, const))
10000 loops, best of 3: 150 µs per loop

In [121]: %timeit cyber(odds, const)
1000 loops, best of 3: 541 µs per loop

In this case, the vectorized calculation in find_pairs gives enough of an advantage over explicit python loops that it is faster than the others. 在这种情况下, find_pairs的向量化计算相对于显式python循环具有足够的优势,它比其他python循环更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM