简体   繁体   中英

Using generators to make lazy streams in python

I've been messing with streams in python and been trying to generate the Hamming numbers (all the numbers with prime factors of 2, 3, or 5 only). The standard way for doing so, described by Dijkstra, is to observe that:

  1. The sequence of Hamming numbers begins with 1.
  2. The remaining values in the sequence are of the form 2h, 3h, and 5h, where h is any Hamming number.
  3. h is be generated by outputting the value 1, and then merging together 2h, 3h, and 5h

My implementation is this:

def hamming():
    yield 1
    yield from merge(scale_stream(hamming(), 2), scale_stream(hamming(), 3))

def merge(s1, s2):
  x1, x2 = next(s1), next(s2)
  while True:
    if x1 < x2:
        yield x1
        x1 = next(s1)
    elif x1 > x2:
        yield x2
        x2 = next(s2)
    else:
        yield x1
        x1, x2 = next(s1), next(s2)

def scale_stream(stream, scalar):
    for e in stream:
        yield e * scalar

def stream_index(stream, n):
    for i, e in enumerate(stream):
        if i+1 == n:
            return e

print(stream_index(hamming(), 300))

This does correctly produce the stream of Hamming numbers, however for whatever reason it takes more and more time the longer it generates, even though in theory the time complexity should be O(N).

I have played around with other streams before but my intuition for them is pretty weak so I have no idea what is going on here. I think the issue is in the recursive way I defined hamming(); I don't know if it is an issue that every call to hamming might spawn a new version of the process that has to run in parallel thereby slowing it down.

Honestly though like I said I have a very poor idea of what actually happens when I run it and debugging has gotten me nowhere, so if someone with more experience can enlighten me I would really appreciate it.

The further you get out into your stream, the more duplicates you're going to have to be merging. The number 2**4 * 3 **4 = 1296 is going to appear 70 times in your multiple stream (8 choose 4), and your program is going to be spending more time merging duplicates than it is outputting new items.

The further you go out, the more duplication you'r going to be dealing with. There is no reason to expect your program to be linear.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM