简体   繁体   中英

How to make this code more efficient in Python?

I'm having trouble running this nested for loop efficiently. I need to run this loops on a string s whose length is about 90,000. Can anyone provide any tips?

This code is supposed to take a string, and chop it up into pieces n sizes long such that the pieces are a continuous part of the original string. The program then returns the size of each set for n up to the length of the string.

For example: GATTACAT for n = 2 would produce {'GA', 'AT', 'TT', 'TA', 'AC', 'CA', 'AT' }. It would take the set of this so {'GA', 'AT', 'TT', 'TA', 'AC', 'CA'} and return its length.

The program is to do this from n = 0 to n = len('GATTACAT'), and sum all set lengths.

for m in range(1, len(s)+1):
    sublist = list()
    for n in range(0, len(s)-m+1):
        sublist.append(''.join(ind[n:n+m]))
    sumS += len(set(sublist))

thanks!

Some quick ideas come to mind:

slen = 1 + len(s) # do this once, not a bunch of times in the loop
for m in range(1, slen):
    sublist = [''.join(ind[n:n+m]) for n in range(slen-m))] # list comps are usually faster than loops
    sumS += len(set(sublist))

Actually you can probably do it as a larger comprehension:

slen = 1 + len(s)
sumS += sum(len(set(''.join(ind[n:n+m]) for n in range(slen-m))) for m in range(1,slen))

If you have Python 3 use a set comprehension instead of the list comprehension above.

>>> s = 'GATTACAT'

>>> [s[i:i+2] for i in range(len(s)-1)]
['GA', 'AT', 'TT', 'TA', 'AC', 'CA', 'AT']

>>> [s[i:i+3] for i in range(len(s)-2)]
['GAT', 'ATT', 'TTA', 'TAC', 'ACA', 'CAT']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM