When length of topics and lines_data are 100 and 1.5M respectively. How can I improve its speed. It takes too much time.
My code is as follow:
for j, top in enumerate(topics):
del write[:]
del ranked[:]
file.write("\n")
for i, line in enumerate(lines_data):
string = line
word = string[:18]
tostr = string[20:]
vector = np.fromstring(tostr[:-2], dtype=float, sep=',')
while True:
try:
cos = cosine_similarity(top[1].reshape(1, -1), vector.reshape(1, -1))
cos_list = cos.reshape(1).tolist()
if (i <= 50):
ranked += [(top[0], cos_list[0], word)]
ranked = sorted(ranked, key=lambda tup: tup[1], reverse=True)
elif (i > 50 and ranked[-1] < cos_list[0]):
del (ranked[-1])
ranked += [(top[0], cos_list[0], word)]
ranked = sorted(ranked, key=lambda tup: tup[1], reverse=True)
break
except:
raise
for rank in ranked[:50]:
write.append(rank[0] + " " + str(rank[1]) +" " + rank[2])
file.write("\n".join(write))
Try rewriting:
for rank in ranked[:50]:
write.append(rank[0] + " " + str(rank[1]) +" " + rank[2])
Into:
for rank in ranked[:50]:
elements = [rank[0],str(rank[1]),rank[2]]
write.append(" ".join(elements))
String concatenation is really slow,so join should be a speedup.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.