Python optimization

Question

f = open('wl4.txt', 'w')
hh = 0
######################################
for n in range(1,5):
    for l in range(33,127):
        if n==1:
            b = chr(l) + '\n'
            f.write(b)
            hh += 1 
        elif n==2:           
            for s0 in range(33, 127):
                b = chr(l) + chr(s0) + '\n'
                f.write(b)
                hh += 1
        elif n==3:          
            for s0 in range(33, 127):
                for s1 in range(33, 127):
                    b = chr(l) + chr(s0) + chr(s1) + '\n'
                    f.write(b)
                    hh += 1 
        elif n==4:    
            for s0 in range(33, 127):
                for s1 in range(33, 127):
                    for s2 in range(33,127):
            b = chr(l) + chr(s0) + chr(s1) + chr(s2) + '\n'
            f.write(b)
            hh += 1 
######################################
print "We Made %d Words." %(hh)
######################################
f.close()

So, is there any method to make it faster?

Answer 1

Further significant improvements are possible.

The following script file demonstrates these, using (for brevity) only the size 4 loop (which takes up well over 90% of the time).

method 0: the OP's original code

method 1: John Kugleman's solution

method 2: (1) and move some string concatenation out of inner loops

method 3: (2) and put the code inside a function -- accessing local variables is MUCH faster than global variables. Any script can do this. Many scripts should do this.

method 4: (3) and accumulate strings in a list then join them and write them. Note that this uses memory like you may not believe. My code doesn't attempt to do it for the whole file, because (127 - 33) ** 4 is 78M strings. On a 32-bit box, that's 78 * 4 = 312Mb for the list alone (ignoring unused memory at the end of the list), plus 78 * 28 = 2184 Mb for the str objects (sys.getsizeof("1234") produces 28), plus 78 * 5 = 390 Mb for the join result. You just blew your user address space or your ulimit or something else blowable. Or if you have 1 Gb of real memory of which 128Mb has been snarfed by the video driver, but enough swap space, you have time for lunch (if running a particular OS, dinner as well).

method 5: (4) and don't ask the list for the whereabouts of its append attribute 78 million times :-)

Here is the script file:

import time, sys
time_function = time.clock # Windows; time.time may be better on *x
ubound, which = map(int, sys.argv[1:3])
t0 = time_function()
if which == 0:
    ### original ###
    f = open('wl4.txt', 'w')
    hh = 0
    n = 4
    for l in range(33, ubound):
        if n == 1:
            pass
        elif n == 2:
            pass
        elif n == 3:
            pass
        elif n == 4:
            for s0 in range(33, ubound):
                for s1 in range(33, ubound):
                    for s2 in range(33,ubound):
                        b = chr(l) + chr(s0) + chr(s1) + chr(s2) + '\n'
                        f.write(b)
                        hh += 1
    f.close()
elif which == 1:
    ### John Kugleman ###
    f = open('wl4.txt', 'w')
    chars = [chr(c) for c in range(33, ubound)]
    hh = 0
    for l in chars:
        for s0 in chars:
            for s1 in chars:
                for s2 in chars:
                    b = l + s0 + s1 + s2 + '\n'
                    f.write(b)
                    hh += 1
    f.close()
elif which == 2:
    ### JohnK, saving + ###
    f = open('wl4.txt', 'w')
    chars = [chr(c) for c in range(33, ubound)]
    hh = 0
    for L in chars: # "L" as in "Legible" ;-)
        for s0 in chars:
            b0 = L + s0
            for s1 in chars:
                b1 = b0 + s1
                for s2 in chars:
                    b = b1 + s2 + '\n'
                    f.write(b)
                    hh += 1
    f.close()
elif which == 3:
    ### JohnK,  saving +, function ###
    def which3func():
        f = open('wl4.txt', 'w')
        chars = [chr(c) for c in range(33, ubound)]
        nwords = 0
        for L in chars:
            for s0 in chars:
                b0 = L + s0
                for s1 in chars:
                    b1 = b0 + s1
                    for s2 in chars:
                        b = b1 + s2 + '\n'
                        f.write(b)
                        nwords += 1
        f.close()
        return nwords
    hh = which3func()
elif which == 4:
    ### JohnK, saving +, function, linesep.join() ###
    def which4func():
        f = open('wl4.txt', 'w')
        chars = [chr(c) for c in range(33, ubound)]
        nwords = 0
        for L in chars:
            accum = []
            for s0 in chars:
                b0 = L + s0
                for s1 in chars:
                    b1 = b0 + s1
                    for s2 in chars:
                        accum.append(b1 + s2)
            nwords += len(accum)
            accum.append("") # so that we get a final newline
            f.write('\n'.join(accum))
        f.close()
        return nwords
    hh = which4func()
elif which == 5:
    ### JohnK, saving +, function, linesep.join(), avoid method lookup in loop ###
    def which5func():
        f = open('wl4.txt', 'w')
        chars = [chr(c) for c in range(33, ubound)]
        nwords = 0
        for L in chars:
            accum = []; accum_append = accum.append
            for s0 in chars:
                b0 = L + s0
                for s1 in chars:
                    b1 = b0 + s1
                    for s2 in chars:
                        accum_append(b1 + s2)
            nwords += len(accum)
            accum_append("") # so that we get a final newline
            f.write('\n'.join(accum))
        f.close()
        return nwords
    hh = which5func()
else:
    print "Bzzzzzzt!!!"
t1 = time_function()
print "Method %d made %d words in %.1f seconds" % (which, hh, t1 - t0)

Here are some results:

C:\junk\so>for %w in (0 1 2 3 4 5) do \python26\python wl4.py 127 %w

C:\junk\so>\python26\python wl4.py 127 0
Method 0 made 78074896 words in 352.3 seconds

C:\junk\so>\python26\python wl4.py 127 1
Method 1 made 78074896 words in 183.9 seconds

C:\junk\so>\python26\python wl4.py 127 2
Method 2 made 78074896 words in 157.9 seconds

C:\junk\so>\python26\python wl4.py 127 3
Method 3 made 78074896 words in 126.0 seconds

C:\junk\so>\python26\python wl4.py 127 4
Method 4 made 78074896 words in 68.3 seconds

C:\junk\so>\python26\python wl4.py 127 5
Method 5 made 78074896 words in 60.5 seconds

Update in response to OP's questions

""" When I try to add for loops, i got a memory error for accum_append.. what is the problem ??"""

I don't know what the problem is; I can't read your code at this distance. Guess: If you are trying to do length == 5, you have probably got the accum initialisation and writing bits in the wrong place, and accum is trying to grow beyond the capacity of your system's memory (as I hoped I'd explained earlier).

"""Now Method 5 is the fastest one, but its make a word tell length 4.. how could i make how much i want ?? :)"""

You have two choices: (1) you continue to use nested for loops (2) you look at the answers that don't use nested for loops, with the length specified dynamically.

Methods 4 and 5 got speedups by using accum but the manner of doing that was tailored to the exact knowledge of how much memory would be used.

Below are 3 more methods. 101 is tgray's method with no extra memory use. 201 is Paul Hankin's method (plus some write-to-file code) similarly with no extra memory use. These two methods are of about the same speed and are within sight of method 3 speedwise. They both allow dynamic specification of the desired length.

Method 102 is tgray's method with a fixed 1Mb buffer -- it attempts to save time by reducing the number of calls to f.write() ... you may wish to experiment with the buffer size. You could create an orthogonal 202 method if you wished. Note that tgray's method uses itertools.product for which you'll need Python 2.6, whereas Paul Hankin's method uses generator expressions which have been around for a while.

elif which == 101:
    ### tgray, memory-lite version
    def which101func():
        f = open('wl4.txt', 'w')
        f_write = f.write
        nwords = 0
        chars = map(chr, xrange(33, ubound))  # create a list of characters
        length = 4 #### length is a variable
        for x in product(chars, repeat=length):
            f_write(''.join(x) + '\n')
            nwords += 1
        f.close()
        return nwords
    hh = which101func()
elif which == 102:
    ### tgray, memory-lite version, buffered
    def which102func():
        f = open('wl4.txt', 'w')
        f_write = f.write
        nwords = 0
        chars = map(chr, xrange(33, ubound))  # create a list of characters
        length = 4 #### length is a variable
        buffer_size_bytes = 1024 * 1024
        buffer_size_words = buffer_size_bytes // (length + 1)
        words_in_buffer = 0
        buffer = []; buffer_append = buffer.append
        for x in product(chars, repeat=length):
            words_in_buffer += 1
            buffer_append(''.join(x) + '\n')
            if words_in_buffer >= buffer_size_words:
                f_write(''.join(buffer))
                nwords += words_in_buffer
                words_in_buffer = 0
                del buffer[:]
        if buffer:
            f_write(''.join(buffer))
            nwords += words_in_buffer
        f.close()
        return nwords
    hh = which102func()
elif which == 201:
    ### Paul Hankin (needed output-to-file code added)
    def AllWords(n, CHARS=[chr(i) for i in xrange(33, ubound)]):
        #### n is the required word length
        if n == 1: return CHARS
        return (w + c for w in AllWords(n - 1) for c in CHARS)
    def which201func():
        f = open('wl4.txt', 'w')
        f_write = f.write
        nwords = 0
        for w in AllWords(4):
            f_write(w + '\n')
            nwords += 1
        f.close()
        return nwords
    hh = which201func()

Answer 2

You can create the range(33, 127) once and save it off. Not having to create it repeatedly cuts the runtime in half on my machine.

chars = [chr(c) for c in range(33, 127)]

...

for s0 in chars:
    for s1 in chars:
        for s2 in chars:
            b = l + s0 + s1 + s2 + '\n'
            f.write(b)
            hh += 1

Answer 3

The outer loop seems pretty pointless. Why not simply:

for l in range(33,127)
   .. your code for the n==1 case

for l in range(33,127)
   .. your code for the n==2 case

for l in range(33,127)
   .. your code for the n==3 case

for l in range(33,127)
   .. your code for the n==4 case

That will be both faster and easier read.

Answer 4

When doing operations that involve iterating a lot, a good place to start is the itertools package.

In this case it looks like you want the product function . Which gives you the:

cartesian product, equivalent to a nested for-loop

So to get a list of the "words" you are creating:

from itertools import product

chars = map(chr, xrange(33,127))  # create a list of characters
words = []                        # this will be the list of words

for length in xrange(1, 5):       # length is the length of the words created
    words.extend([''.join(x) for x in product(chars, repeat=length)])

# instead of keeping a separate counter, hh, we can use the len function
print "We Made %d Words." % (len(words))  

f = open('wl4.txt', 'w')
f.write('\n'.join(words))         # write one word per line
f.close()

As a result we get the result that your script gives us. And since itertools is implemented in c , it is also faster.

Edit:

Per John Machin's very astute comment about the memory usage, here's updated code that doesn't give me a memory error when I run it on the whole range(33, 127).

from itertools import product

chars = map(chr, xrange(33,127))  # create a list of characters
f_words = open('wl4.txt', 'w')

num_words = 0                     # a counter (was hh in OPs code)
for length in xrange(1, 5):       # length is the length of the words created
    for char_tup in product(chars, repeat=length):
        f_words.write(''.join(char_tup) + '\n')
        num_words += 1

f.close()
print "We Made %d Words." % (num_words)

This runs in about 4 minutes (240 seconds) on my machine.

Answer 5

How about this it workes with arbitary word lengths: (Password generator?)

f = open('wl4.txt', 'w')
hh=0
chars = map(chr,xrange(33, 127))

def func(n, result):
    if (n == 0):
        f.write(result + "\n")
        hh +=1
    else:
        for c in chars:
            func(n-1, result+c)

for n in range(1, 5):
    func(n,"")
######################################   
print "We Made %d Words." %(hh)   
######################################   
f.close()

Answer 6

Do you need all of the words sorted by their length? If you can mingle the lengths together, you can improve slightly on John Kugelman's answer like this:

f = open("wl4.txt", "w")

chars = [chr(c) for c in range(33, 127)]
c = len(chars)
count = c + c*c + c**3 + c**4

for c0 in chars:
    print >>f, c0
    for c1 in chars:
        s1 = c0 + c1
        print >>f, s1
        for c2 in chars:
            s2 = s1 + c2
            print >>f, s2
            for c3 in chars:
                print >>f, s2 + c3

print "We Made %d Words." % count

Directly calculating hh instead of all of the incrementing is also a big win (about 15% on this laptop). There's also an improvement from using print over f.write, though i have no idea why that's the case. This version runs in about 39 seconds for me.

Answer 7

Here's a short recursive solution.

def AllWords(n, CHARS=[chr(i) for i in xrange(33, 127)]):
    if n == 1: return CHARS
    return (w + c for w in AllWords(n - 1) for c in CHARS)

for i in xrange(1, 5):
    for w in AllWords(i):
        print w

PS: is it an error that character 127 is excluded?

Python optimization

Question

7 answers

solution1
8 2010-03-12 23:54:23

solution2
7 ACCPTED 2010-03-12 14:09:41

solution3
2 2010-03-12 14:02:46

solution4
2 2010-03-12 19:05:10

solution5
1 2010-03-12 14:29:10

solution6
0 2010-03-12 18:08:31

solution7
0

Python optimization

Question

7 answers

solution1 8 2010-03-12 23:54:23

solution2 7 ACCPTED 2010-03-12 14:09:41

solution3 2 2010-03-12 14:02:46

solution4 2 2010-03-12 19:05:10

solution5 1 2010-03-12 14:29:10

solution6 0 2010-03-12 18:08:31

solution7 0

solution1
8 2010-03-12 23:54:23

solution2
7 ACCPTED 2010-03-12 14:09:41

solution3
2 2010-03-12 14:02:46

solution4
2 2010-03-12 19:05:10

solution5
1 2010-03-12 14:29:10

solution6
0 2010-03-12 18:08:31

solution7
0