简体   繁体   English

如何优化此Python代码以更快地运行?

[英]How can I optimize this Python code to run faster?

I'm solving a problem which has some time and memory constraints, and unfortunately this is failing the time constraints. 我正在解决一个有时间和内存限制的问题,不幸的是,这是时间限制的失败。

I'm fairly new to Python, so any feedback on faster/better methods is appreciated. 我是Python的新手,因此对于更快/更好的方法的任何反馈都表示赞赏。

This is the problem the program is trying to solve: 这是程序试图解决的问题:

Define the similarity of two strings A & B as the length of the longest common prefix that they share. 将两个字符串A和B的相似度定义为它们共享的最长公共前缀的长度。 ie the similarity of AAAB and AABCAAAB is 2. 即AAAB和AABCAAAB的相似性为2。

The program should output the sum of similarities of the input string with all of its suffixes. 程序应该输出输入字符串与其所有后缀的相似之和。 ie for AAAB, it should output 即对于AAAB,它应该输出

similarity(AAAB,AAAB) + similarity(AAAB,AAB) + similarity(AAAB,AB) +similarity(AAAB,B) = 4 + 2 + 1 + 0 = 7 相似度(AAAB,AAAB)+相似度(AAAB,AAB)+相似度(AAAB,AB)+相似度(AAAB,B)= 4 + 2 + 1 + 0 = 7

The first line of input is the number of strings to be entered, and each subsequent line contains a string to be processed. 第一行输入是要输入的字符串数,每个后续行包含要处理的字符串。

from array import array

n = int(sys.stdin.readline()) 
A = [0] * n #List of answers

for i in range(1,n+1):
  string = sys.stdin.readline().strip()    
  A[i-1] = len(string)
  for j in range(1, len(string)):
    substr = string[j:len(string)]
    sum = 0
    for k in range(0, len(substr)):
        if substr[k] != string[k]:
            break
        else:
            sum += 1
    A[i-1] += sum

for i,d in enumerate(A):
  print d

In terms of performance prefer xrange as its faster for iterating in python2.X But the best advice I can give is to use timeit to measure the changes and improvements whilst tweaking your algorithm. 在性能方面,更喜欢xrange,因为它在python2.X中迭代更快。但我能给出的最佳建议是使用timeit来测量变化和改进,同时调整算法。

Having googled theres another implementation here: Longest Common substring solution but the python-Levenshtein library is probably your best bet as it has C extension for speed... 谷歌搜索了另一个实现: 最长的共同子串解决方案,但python-Levenshtein库可能是你最好的选择,因为它有速度的C扩展...

The first step is to reduce the amount of indexing you're doing: 第一步是减少你正在做的索引量:

import sys

n = int(sys.stdin.readline())

for i in range(n):
    string = sys.stdin.readline().strip()
    sum = 0
    for offset in range(len(string)):
        suffix = string[offset:]
        for c1, c2 in zip(string, suffix):
            if c1 != c2:
                break
            sum += 1
    print sum

This is still O(N^2), though. 不过,这仍然是O(N ^ 2)。 For O(N), use a suffix tree or array, such as http://code.google.com/p/pysuffix/ 对于O(N),请使用后缀树或数组,例如http://code.google.com/p/pysuffix/

You can try an alternate implementation 您可以尝试其他实现

sum(len(os.path.commonprefix([instr,instr[i:]])) for i in xrange(0,len(instr)))

where instr = Your said String instr =你说的字符串

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM