简体   繁体   English

在 Python 中合并惰性流(使用生成器)

[英]Merge of lazy streams (using generators) in Python

I'm playing with functional capacities of Python 3 and I tried to implement classical algorithm for calculating Hamming numbers.我在玩 Python 3 的功能,我尝试实现经典算法来计算汉明数。 That's the numbers which have as prime factors only 2, 3 or 5. First Hamming numbers are 2, 3, 4, 5, 6, 8, 10, 12, 15, 16, 18, 20 and so on.这是质因数只有 2、3 或 5 的数。第一个汉明数是 2、3、4、5、6、8、10、12、15、16、18、20 等等。

My implementation is the following:我的实现如下:

def scale(s, m):
    return (x*m for x in s)

def merge(s1, s2):
    it1, it2 = iter(s1), iter(s2)
    x1, x2 = next(it1), next(it2)
    if x1 < x2:
        x = x1
        it = iter(merge(it1, s2))
    elif x1 > x2:
        x = x2
        it = iter(merge(s1, it2))
    else:
        x = x1
        it = iter(merge(it1, it2))
    yield x
    while True: yield next(it)

def integers():
    n = 0
    while True:
        n += 1
        yield n

m2 = scale(integers(), 2)
m3 = scale(integers(), 3)
m5 = scale(integers(), 5)

m23 = merge(m2, m3)

hamming_numbers = merge(m23, m5)

The problem it that merge seems just doesn't work.合并似乎不起作用的问题。 Before that I implemented Sieve of Eratosthenes the same way, and it worked perfectly okay:在此之前,我以同样的方式实现了 Eratosthenes 的筛分法,它运行得非常好:

def sieve(s):
    it = iter(s)
    x = next(it)
    yield x
    it = iter(sieve(filter(lambda y: x % y, it)))
    while True: yield next(it)

This one uses the same techniques as my merge operation.这个使用与我的合并操作相同的技术。 So I can't see any difference.所以我看不出任何区别。 Do you have any ideas?你有什么想法?

(I know that all of these can be implemented other ways, but my goal exactly to understand generators and pure functional capabilities, including recursion, of Python, without using class declarations or special pre-built Python functions.) (我知道所有这些都可以通过其他方式实现,但我的目标正是理解 Python 的生成器和纯函数功能,包括递归,而不使用类声明或特殊的预构建 Python 函数。)

UPD: For Will Ness here's my implementation of this algorithms in LISP (Racket actually): UPD:对于 Will Ness,这是我在 LISP 中实现的这个算法(实际上是 Racket):

(define (scale str m)
  (stream-map (lambda (x) (* x m)) str))

(define (integers-from n)
  (stream-cons n
               (integers-from (+ n 1))))

(define (merge s1 s2)
  (let ((x1 (stream-first s1))
        (x2 (stream-first s2)))
    (cond ((< x1 x2)
           (stream-cons x1 (merge (stream-rest s1) s2)))
          ((> x1 x2)
           (stream-cons x2 (merge s1 (stream-rest s2))))
          (else
           (stream-cons x1 (merge (stream-rest s1) (stream-rest s2)))))))


(define integers (integers-from 1))

(define hamming-numbers
  (stream-cons 1 (merge (scale hamming-numbers 2)
                        (merge (scale hamming-numbers 3)
                               (scale hamming-numbers 5)))))

Your algorithm is incorrect.你的算法不正确。 Your m2, m3, m5 should be scaling hamming_numbers , not integers .您的m2, m3, m5应该缩放hamming_numbers ,而不是integers

The major problem is this: your merge() calls next() for both its arguments unconditionally, so both get advanced one step.主要问题是:您的merge()无条件地为其两个参数调用next() ,因此两者都前进了一步。 So after producing the first number, eg 2 for the m23 generator, on the next invocation it sees its 1st argument as 4(,6,8,...) and 2nd as 6(,9,12,...) .因此,在生成第一个数字后,例如m23生成器的2 ,在下一次调用时,它将第一个参数视为4(,6,8,...) ,第二个参数为6(,9,12,...) The 3 is already gone. 3已经没了。 So it always pulls both its arguments, and always returns the head of the 1st (test entry at http://ideone.com/doeX2Q ).所以它总是拉它的两个参数,并且总是返回第一个的头部( http://ideone.com/doeX2Q 上的测试条目)。

Calling iter() is totally superfluous, it adds nothing here.调用iter()完全是多余的,它在这里没有任何添加。 When I remove it ( http://ideone.com/7tk85h ), the program works exactly the same and produces exactly the same (wrong) output.当我删除它( http://ideone.com/7tk85h )时,程序的工作原理完全相同,并产生完全相同(错误)的输出。 Normally iter() serves to create a lazy iterator object, but its arguments here are already such generators.通常iter()用于创建一个惰性迭代器对象,但它的参数在这里已经是这样的生成器。

There's no need to call iter() in your sieve() as well ( http://ideone.com/kYh7Di ).也不需要在您的sieve()调用iter()http://ideone.com/kYh7Di )。 sieve() already defines a generator, and filter() in Python 3 creates an iterator from a function and an iterable (generators are iterable). sieve()已经定义了一个生成器,而 Python 3 中的filter()从一个函数和一个可迭代对象(生成器可迭代的)创建了一个迭代器。 See also eg Difference between Python's Generators and Iterators .另请参见例如Python 的生成器和迭代器之间的差异

We can do the merge like this, instead:我们可以像这样进行合并,而不是:

def merge(s1, s2):
  x1, x2 = next(s1), next(s2)
  while True:
    if x1 < x2:
        yield x1
        x1 = next(s1)
    elif x1 > x2:
        yield x2
        x2 = next(s2)
    else:
        yield x1
        x1, x2 = next(s1), next(s2)

Recursion in itself is non-essential in defining the sieve() function too.在定义sieve()函数时,递归本身也不是必需的。 In fact it only serves to obscure there an enormous deficiency of that code.事实上,它只是用来掩盖该代码的巨大缺陷。 Any prime it produces will be tested by all the primes below it in value - but only those below its square root are truly needed.它产生的任何质数都将由其值以下的所有质数进行测试——但只有那些低于其平方根的质数才是真正需要的。 We can fix it quite easily in a non-recursive style ( http://ideone.com/Qaycpe ):我们可以很容易地以非递归方式修复它( http://ideone.com/Qaycpe ):

def sieve(s):    # call as: sieve( integers_from(2))
    x = next(s)  
    yield x
    ps = sieve( integers_from(2))           # independent primes supply
    p = next(ps) 
    q = p*p       ; print((p,q))
    while True:
        x = next(s)
        while x<q: 
            yield x
            x = next(s)
        # here x == q
        s = filter(lambda y,p=p: y % p, s)  # filter creation postponed 
        p = next(ps)                        #   until square of p seen in input
        q = p*p 

This is now much, much, much more efficient (see also: Explain this chunk of haskell code that outputs a stream of primes ).这是现在很多很多,有效(参见: 解释的输出素数的流Haskell代码这个块)。

Recursive or not, is just a syntactic characteristic of a code .递归与否,只是代码的句法特征。 The actual run-time structures are the same - the filter() adaptors being hoisted on top of an input stream - either at the appropriate moments, or way too soon (so we'd end up with way too many of them).实际的运行时结构是相同的filter()适配器被提升到输入流的顶部——要么是在适当的时刻,要么是太快了(所以我们最终会得到太多)。

I will propose this different approach - using Python heapq (min-heapq) with generator ( lazy evaluation) (if you don't insist to keep the merge() function)我将提出这种不同的方法 - 使用 Python heapq (min-heapq) 和生成器(惰性求值) (如果您不坚持保留 merge() 函数)

from heapq import heappush, heappop

def hamming_numbers(n):
    ans = [1]

    last = 0
    count = 0

    while count < n:
        x = heappop(ans)

        if x > last:
            yield x

            last = x
            count += 1

            heappush(ans, 2* x)
            heappush(ans, 3* x)
            heappush(ans, 5* x)


    >>> n  = 20
    >>> print(list(hamming_numbers(20)))
       [1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 27, 30, 32, 36]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM