简体   繁体   中英

Improve speed of python double loop of large lists

When length of topics and lines_data are 100 and 1.5M respectively. How can I improve its speed. It takes too much time.

My code is as follow:

for j, top in enumerate(topics):
    del write[:]
    del ranked[:]
    file.write("\n")
    for i, line in enumerate(lines_data):
        string = line
        word = string[:18]
        tostr = string[20:]
        vector = np.fromstring(tostr[:-2], dtype=float, sep=',')
        while True:
            try:
                cos = cosine_similarity(top[1].reshape(1, -1), vector.reshape(1, -1))
                cos_list = cos.reshape(1).tolist()
                if (i <= 50):
                    ranked += [(top[0], cos_list[0], word)]
                    ranked = sorted(ranked, key=lambda tup: tup[1], reverse=True)
                elif (i > 50 and ranked[-1] < cos_list[0]):
                    del (ranked[-1])
                    ranked += [(top[0], cos_list[0], word)]
                    ranked = sorted(ranked, key=lambda tup: tup[1], reverse=True)
                break
            except:
                raise
    for rank in ranked[:50]:
        write.append(rank[0] + " " + str(rank[1]) +" " + rank[2])

    file.write("\n".join(write))

Try rewriting:

for rank in ranked[:50]:
    write.append(rank[0] + " " + str(rank[1]) +" " + rank[2])

Into:

for rank in ranked[:50]:
    elements = [rank[0],str(rank[1]),rank[2]]
    write.append(" ".join(elements))

String concatenation is really slow,so join should be a speedup.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM