After adding a line of code
pathResult.append(find_max_path(arr[a + 1:, b + 1:], path, 1))
began to run slowly, but without this code it does not work correctly. How can i optimize the code? The function looks for the path with the maximum number of points in a two-dimensional array where values equal to 100 lie predominantly on the main diagonal. Rows can have the same value equal to 100, but in any column the value 100 is one or none. Full code:
import numpy as np
arr = np.array([
[000,000,000,000,000,000,000],
[000,000,000,000,000,100,000],
[000,000,000,000,000,000,000],
[000,000,100,000,000,000,000],
[000,000,000,100,000,000,000],
[000,000,000,000,100,000,000],
[000,000,000,000,000,000,000],
[000,100,000,000,000,000,000]])
def find_max_path(arr, path=None, countempty=0):
if path is None:
path = []
a = 0
b = 0
while (a < len(arr)) and (b < len(arr[a])):
if arr[a][b] == 100:
path.append({"a": 1 + countempty, "b": 1})
countempty = 0
a += 1
b += 1
continue
else:
check = []
for j in range(b + 1, len(arr[a])):
if arr[a][j] == 100:
check.append({"arr": arr[a + 1:, j + 1:],
"a": 1 + countempty,
"b": j - b + 1})
break
if not check:
countempty += 1
a += 1
continue
i = a
while i < len(arr):
if arr[i][b] == 100:
check.append({"arr": arr[i + 1:, b + 1:],
"a": i - a + 1,
"b": 1})
break
i += 1
pathResult = []
for c in check:
pathNew = path[:]
pathNew.append({"a": c["a"], "b": c["b"]})
pathResult.append(find_max_path(c["arr"], pathNew))
maximum = 0
maxpath = []
pathResult.append(find_max_path(arr[a + 1:, b + 1:], path, 1))
for p in pathResult:
if len(p) > maximum:
maximum = len(p)
maxpath = p[:]
if maxpath:
return maxpath
else:
countempty += 1
a += 1
return path
print(find_max_path(arr))
UPDATE1: add two break
in inner loops (execution time is halved)
Output:
[{'a': 3, 'b': 2}, {'a': 1, 'b': 1}, {'a': 1, 'b': 1}]
UPDATE2
Usage. I use this algorithm to synchronize two streams of information. I have words from the text along the lines, about which it is known where they are in the text of the book L_word
. By columns, I have recognized words from the audiobook, about which the recognized word itself is known and when it was spoken in the audio stream R_word
. It turns out two arrays of words. To synchronize these two lists, I use something like this
from rapidfuzz import process, fuzz
import numpy as np
window = 50
# L_word = ... # words from text book
# R_word = ... # recognize words from audiobook
L = 0
R = 0
L_chunk = L_word[L:L+window]
R_chunk = R_word[R:R+window]
scores = process.cdist(L_chunk,
R_chunk,
scorer=fuzz.ratio,
type=np.uint8,
score_cutoff=100)
p = find_max_path(scores)
# ... path processing ...
... as a result of all the work, we get something like this video book with pagination and subtitles synchronized with audio download 3GB
Python shows how to do debugging and profiling . Go around the algorithm and time functions to see where the bottleneck is
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.