How can I optimize this Python code to run faster?

Question

I'm solving a problem which has some time and memory constraints, and unfortunately this is failing the time constraints.

I'm fairly new to Python, so any feedback on faster/better methods is appreciated.

This is the problem the program is trying to solve:

Define the similarity of two strings A & B as the length of the longest common prefix that they share. ie the similarity of AAAB and AABCAAAB is 2.

The program should output the sum of similarities of the input string with all of its suffixes. ie for AAAB, it should output

similarity(AAAB,AAAB) + similarity(AAAB,AAB) + similarity(AAAB,AB) +similarity(AAAB,B) = 4 + 2 + 1 + 0 = 7

The first line of input is the number of strings to be entered, and each subsequent line contains a string to be processed.

from array import array

n = int(sys.stdin.readline()) 
A = [0] * n #List of answers

for i in range(1,n+1):
  string = sys.stdin.readline().strip()    
  A[i-1] = len(string)
  for j in range(1, len(string)):
    substr = string[j:len(string)]
    sum = 0
    for k in range(0, len(substr)):
        if substr[k] != string[k]:
            break
        else:
            sum += 1
    A[i-1] += sum

for i,d in enumerate(A):
  print d

Answer 1

In terms of performance prefer xrange as its faster for iterating in python2.X But the best advice I can give is to use timeit to measure the changes and improvements whilst tweaking your algorithm.

Having googled theres another implementation here: Longest Common substring solution but the python-Levenshtein library is probably your best bet as it has C extension for speed...

Answer 2

The first step is to reduce the amount of indexing you're doing:

import sys

n = int(sys.stdin.readline())

for i in range(n):
    string = sys.stdin.readline().strip()
    sum = 0
    for offset in range(len(string)):
        suffix = string[offset:]
        for c1, c2 in zip(string, suffix):
            if c1 != c2:
                break
            sum += 1
    print sum

This is still O(N^2), though. For O(N), use a suffix tree or array, such as http://code.google.com/p/pysuffix/

Answer 3

You can try an alternate implementation

sum(len(os.path.commonprefix([instr,instr[i:]])) for i in xrange(0,len(instr)))

where instr = Your said String

How can I optimize this Python code to run faster?

Question

3 answers

solution1
2 ACCPTED 2012-01-05 09:54:13

solution2
1

solution3
1 2012-01-05 10:29:20

How can I optimize this Python code to run faster?

Question

3 answers

solution1 2 ACCPTED 2012-01-05 09:54:13

solution2 1

solution3 1 2012-01-05 10:29:20

solution1
2 ACCPTED 2012-01-05 09:54:13

solution2
1

solution3
1 2012-01-05 10:29:20