How to make this code more efficient in Python?

Question

I'm having trouble running this nested for loop efficiently. I need to run this loops on a string s whose length is about 90,000. Can anyone provide any tips?

This code is supposed to take a string, and chop it up into pieces n sizes long such that the pieces are a continuous part of the original string. The program then returns the size of each set for n up to the length of the string.

For example: GATTACAT for n = 2 would produce {'GA', 'AT', 'TT', 'TA', 'AC', 'CA', 'AT' }. It would take the set of this so {'GA', 'AT', 'TT', 'TA', 'AC', 'CA'} and return its length.

The program is to do this from n = 0 to n = len('GATTACAT'), and sum all set lengths.

for m in range(1, len(s)+1):
    sublist = list()
    for n in range(0, len(s)-m+1):
        sublist.append(''.join(ind[n:n+m]))
    sumS += len(set(sublist))

thanks!

Answer 1

Some quick ideas come to mind:

slen = 1 + len(s) # do this once, not a bunch of times in the loop
for m in range(1, slen):
    sublist = [''.join(ind[n:n+m]) for n in range(slen-m))] # list comps are usually faster than loops
    sumS += len(set(sublist))

Actually you can probably do it as a larger comprehension:

slen = 1 + len(s)
sumS += sum(len(set(''.join(ind[n:n+m]) for n in range(slen-m))) for m in range(1,slen))

If you have Python 3 use a set comprehension instead of the list comprehension above.

Answer 2

>>> s = 'GATTACAT'

>>> [s[i:i+2] for i in range(len(s)-1)]
['GA', 'AT', 'TT', 'TA', 'AC', 'CA', 'AT']

>>> [s[i:i+3] for i in range(len(s)-2)]
['GAT', 'ATT', 'TTA', 'TAC', 'ACA', 'CAT']

How to make this code more efficient in Python?

Question

2 answers

solution1
1 2013-10-15 03:36:35

solution2
0 2013-10-15 03:38:10

How to make this code more efficient in Python?

Question

2 answers

solution1 1 2013-10-15 03:36:35

solution2 0 2013-10-15 03:38:10

solution1
1 2013-10-15 03:36:35

solution2
0 2013-10-15 03:38:10